OK. So in part 1 we covered a bunch of MongoDB basics. In this part, I plan on covering in detail the core MongoDB databases, what collections they have and what they’re used for.
When you first install MongoDB, two databases are created for you called admin and local. They don’t contain many collections and may not even contain any to begin with, however don’t worry about this, Mongo will create them as an when it needs them.
The third core database is called config. Config databases are a little more complicated, in that they can only be accessed when connected to a shard instead of a particular daemon. I haven’t covered shards yet. But for now just think of shards as individual mongod nodes, that all together form your data set. How this happens is very customisable, these customisable setting are stored in the config database. I’ll cover shards in great detail in a separate post.
The local database, as its name suggest, stores data about the local instance of Mongo, so if you have a 10 shard cluster, each individual node will have it’s own local database with its own data about itself. Although MongoDB uses this database to store information about itself, feel free to add your own collections to this database, if you have additional data which is specific to a single node.
Here’s a breakdown of collections that MongoDB with use by default:
For each startup of Mongod, a document will be added to this collection with a bunch of useful data.
This collection contains an object id (_id), that represents the node. This object id is used when using shards and replication, so each nodes who who they must communicate with. If two nodes end up with the same object id, this collection can be dropped. it will be recreated and populated with a new object id a few seconds.
Returns a bunch of data about the slaves master.
Contains all slaves which have ever been used as a sync source in replication. A sync source is essentially the master node where all data is pulled from. If a node changes it’s hostname, you may get errors about multiple hosts with the same object id (_id). To resolve this, drop this collection, it will be recreated and populated within a few seconds.
This is the operations log, it stores all changed which have been pulled from the primary node, but still need to be executed on this node. This is a capped collection. I’ll cover all the details about oplogs in an upcoming post on replication.
The config database is used to support sharding in MongoDB and should not be used by your application directly, unless for debugging purposes or setting configuration.
To understand this section fully, you really need to understand a lot of the factors behind how MongoDB shards work. This will all be covered in great depth in a separate post.
Each shard in MongoDB can have its own settings which are stored in this collection. Examples of setting of settings this collection will store includes, chunk size, balancer and chaining. I’ll explain all of these in my post on sharding.
Data on which chunks exist and on which shards. A chunk is essentially a collection of documents that all reside on the same node.
Data on which shards exist and a list of their tags.
Tags are used to make sure that certain documents are saved onto the correct shard. A common use case might be you have 10 shards using spinning disks and 1 with SSD’s and you want to make sure, all the most important data is saved to SSD, or speediness purposes. This could be accomplished by first adding a tag to a shard, then create tag ranges, which say where data should be saved. If a tag range is not found for a document, MongoDB will try to balance documents across shards fairly. This collection will store a list of all the tags and tag ranges.
Data on which databases exist and what their primary shard is.
Data on which shard collections exist.
Data on how data has moved between nodes and how they have split. Don’t worry about this to much for now. Just know MongoDB will sometimes move data between shards in an attempt to balance the load.
Data about which balancing round is currently in progress. Balancing rounds are started by the mongos. Mongos is sort of like an entry point and router for your mongoDB cluster shards.
The admin database doesn’t store much and it doesn’t exist by default, it will be created once the first user has been created. I will show how to do this in a future article.
Stores information about each user. When using shards, users are stored on the config server. A root user should still be put onto each mongod for security reasons.
The system collections are slightly different to what I’ve discussed previously. These collections are stored inside each of your application database. So say you create a database called blog. You might look for a collection called blog.system.indexes.
A list of all index in this database.
if you execute db.setProfileLevel(), Mongod will start profiling the queries you deemed as slow to this collection.
Without the relevant knowledge on shards, that may have been a little tricky to follow at times. I’m you don’t have that knowledge, I cover it all in my post on sharding. But don’t worry about that too much for now. It’ll all come together naturally as the series progresses.
Next up I’ll be discussing MongoDB’s powerful indexing features. Thanks for reading.1 Loves This