MongoDB – Part 2 – Databases

OK. So in part 1 we covered a bunch of MongoDB basics. In this part, I plan on covering in detail the core MongoDB databases, what collections they have and what they’re used for.

When you first install MongoDB, two databases are created for you called admin and local. They don’t contain many collections and may not even contain any to begin with, however don’t worry about this, Mongo will create them as an when it needs them.

The third core database is called config. Config databases are a little more complicated, in that they can only be accessed when connected to a shard instead of a particular daemon. I haven’t covered shards yet. But for now just think of shards as individual mongod nodes, that all together form your data set. How this happens is very customisable, these customisable setting are stored in the config database. I’ll cover shards in great detail in a separate post.

Edit: Learn about MongoDB shards

Local Database

The local database, as its name suggest, stores data about the local instance of Mongo, so if you have a 10 shard cluster, each individual node will have it’s own local database with its own data about itself. Although MongoDB uses this database to store information about itself, feel free to add your own collections to this database, if you have additional data which is specific to a single node.

Here’s a breakdown of collections that MongoDB with use by default:

startup_log

For each startup of Mongod, a document will be added to this collection with a bunch of useful data.

me

This collection contains an object id (_id), that represents the node. This object id is used when using shards and replication, so each nodes who who they must communicate with. If two nodes end up with the same object id, this collection can be dropped. it will be recreated and populated with a new object id a few seconds.

slaves

Returns a bunch of data about the slaves master.

sources

Contains all slaves which have ever been used as a sync source in replication. A sync source is essentially the master node where all data is pulled from. If a node changes it’s hostname, you may get errors about multiple hosts with the same object id (_id). To resolve this, drop this collection, it will be recreated and populated within a few seconds.

oplog.rs

This is the operations log, it stores all changed which have been pulled from the primary node, but still need to be executed on this node. This is a capped collection. I’ll cover all the details about oplogs in an upcoming post on replication.

Edit: Learn about MongoDB shards

Config Database

The config database is used to support sharding in MongoDB and should not be used by your application directly, unless for debugging purposes or setting configuration.

To understand this section fully, you really need to understand a lot of the factors behind how MongoDB shards work. This will all be covered in great depth in a separate post.

Edit: Learn about MongoDB shards

settings

Each shard in MongoDB can have its own settings which are stored in this collection. Examples of setting of settings this collection will store includes, chunk size, balancer and chaining. I’ll explain all of these in my post on sharding.

chunks

Data on which chunks exist and on which shards. A chunk is essentially a collection of documents that all reside on the same node.

shards

Data on which shards exist and a list of their tags.

tags

Tags are used to make sure that certain documents are saved onto the correct shard. A common use case might be you have 10 shards using spinning disks and 1 with SSD’s and you want to make sure, all the most important data is saved to SSD, or speediness purposes. This could be accomplished by first adding a tag to a shard, then create tag ranges, which say where data should be saved. If a tag range is not found for a document, MongoDB will try to balance documents across shards fairly. This collection will store a list of all the tags and tag ranges.

databases

Data on which databases exist and what their primary shard is.

collections

Data on which shard collections exist.

changelog

Data on how data has moved between nodes and how they have split. Don’t worry about this to much for now. Just know MongoDB will sometimes move data between shards in an attempt to balance the load.

locks

Data about which balancing round is currently in progress. Balancing rounds are started by the mongos. Mongos is sort of like an entry point and router for your mongoDB cluster shards.

Admin Database

The admin database doesn’t store much and it doesn’t exist by default, it will be created once the first user has been created. I will show how to do this in a future article.

system.users

Stores information about each user. When using shards, users are stored on the config server. A root user should still be put onto each mongod for security reasons.

System Collections

The system collections are slightly different to what I’ve discussed previously. These collections are stored inside each of your application database. So say you create a database called blog. You might look for a collection called blog.system.indexes.

system.indexes

A list of all index in this database.

system.profile

if you execute db.setProfileLevel(), Mongod will start profiling the queries you deemed as slow to this collection.

Conclusion

Without the relevant knowledge on shards, that may have been a little tricky to follow at times. I’m you don’t have that knowledge, I cover it all in my post on sharding. But don’t worry about that too much for now. It’ll all come together naturally as the series progresses.

Next up I’ll be discussing MongoDB’s powerful indexing features. Thanks for reading.

1 Loves This

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload the CAPTCHA.