It’s taken me a year to write this MongoDB series, admittedly I have had a lot of down time, but this is the finale. One thing I’ve not given much explanation too, is the internals of MongoDB and how it works under, that what we’ll be covering here.
The main internals I want to cover are, how Mongo actually manages it write concerns using locks and how journaling works. After that, I’ll probably ramble on for a bit and have a big emotional climax.
Locks
MongoDB has a pretty expansive set of locks. When I fist started this series, the latest version of MongoDB was 2.6, if I remember correctly, now version 3.2 is the latest. The internals between version 2 and 3 changed a hell of a lot and these changes have propagated down to the locks.
One of the major changes between version 2 and 3 was the switch between the default storage engine. Previously MMAPv1 was the default, now WiredTiger is the default. The changes between these storage engines is sufficient enough to warrant its own blog post, but I’m going to focus the best new feature of the bunch; document level concurrency.
Document level concurrency, allows MongoDB to put a lock on any document in any collection. Previously to version 3 of MongoDB, this wasn’t possible and a lock would have to be put on the entire collection for every write. Obviously this is a humongous improvement.
You can get a list of all locks which are current set on your database by running:
db.currentOp()
You can get a list of all existing locks by running:
mongotop --locks
Apart from the document locks which are created when performing CRUD operation, there are a bunch of operations and commands that will lock up an entire database, lets cover some of these:
- Compact – Used to reclaim disk space.
- Repair – Used when journaling is disabled and there is a bad shutdown causing data to corrupt.
- MapReduce – Used to perform complicated queries across a cluster.
- Copying a database
- Journaling to disk from memory creates a system wide lock for a short interval.
For more information on lock, check out the MongoDB concurrency docs
Journaling
- Every write to a mongod is first to send to an in memory journal
- After 100ms (configurable), journal entries are then written to a disk journal on disk in$DB_PATH/journal/*, so in the event of a crash, no more than 100ms of data will be lost.
- Data is read from the Journal on disk and written to the database every 60 seconds or after 2GB of journal entries are written, whichever comes first.
- Journal files are removed after a clean shutdown.
- On a crash of MongoDB, mongod will replay the journal on reboot.
- Journaling can be disabled, however if you have a hard crash you may get corrupt data on when using the MMAPv1 storage engine. The WiredTiger storage engine is slightly more intelligent and uses a checkpoint system, so in the event of a hard crash, you will not get corrupt data but you will lose some data. For these reasons it is never advised to have journaling disabled and for that reason, I’m not even going to tell you how to disable it.
The write to the disk journal every 10ms instead of every 100ms, you would run
db.adminCommand({ "setParameter": 1, "journalCommitInterval": 10 })
The flow of MongoDB writes
This is a very complicated subject and one which rightly deserves it’s own chapter in a book. Fortunately I haven’t written a MongoDB book, but I’m going to explain in as simple terms as I can, how the flow of how writes are sent from your client to actually be saved in your cluster.
- Your application server wants to update a users name, it opens a connection to one of your mongos servers and sends the write.
- The Mongos will look inside the config servers, to work out which shard should store the information. Once the Mongos has located this information, it will send the write to the primary node in that shards replica set.
- The write is received by the MongoDB wire protocol, this is essentially a TCP/IP socket for mongod.
- The socket will communicate with mongod and start the journalling process, starting with the memory journal and then the disk journal, as described above.
- Every minute (customisable) the disk journal is written to the database and oplog.
- Secondary nodes will suck data from their source and update their oplog. This can happen at any time, depending on how busy the secondaries are.
It’s a pretty complicated process and there are a bunch of configurations that can interfere with these process. On top of that you might not be using shards or replication which seriously simplifies the process. I might actually write a separate post to go over this in detail.
Config Servers
Config servers are required as soon as you start using shards to scale out your database. The config servers are used to store information about where you data resides in the cluster, which nodes you have in the cluster, the status of these nodes and general setting thats your mongod and mongos nodes should adhere too.
Your applications should ideally always have 3 config servers, to prevent data loss, each server should reside on a separate machine, however they can be low powered machines or even the application servers if you budget doesn’t stretch for separate servers. A config server is simple tailored mongod which is started with the –configsvr flag.
mongod --configsvr --dbpath /var/lib/mongodb -f /var/lib/config/mongod.conf
Drum Roll
And that’s a wrap. As I mentioned above, when I first started this series, MongoDB version 3 wasn’t released. Thankfully, despite the internals having been completely re-engineered, the consumer API has stayed almost identical. There’s probably the odd function which has been renamed or the odd parameter which has been meddled with, but I’ve gone through my old posts and updated most of them.
Like I said above, in time I might write up a detailed post about how the internal of writing works within MongoDB, with all the varieties of configuration and architecture. For now, the details above along with all my posts, you should be enough for the average user.
As always, thanks for reading.
4 Love This