MongoDB – Part 10 – Internals and Final Thoughts

It’s taken me a year to write this MongoDB series, admittedly I have had a lot of down time, but this is the finale. One thing I’ve not given much explanation too, is the internals of MongoDB and how it works under, that what we’ll be covering here.

The main internals I want to cover are, how Mongo actually manages it write concerns using locks and how journaling works. After that, I’ll probably ramble on for a bit and have a big emotional climax.

Continue reading

MongoDB – Part 9 – Deployments

Welcome back. Deploying MongoDB isn’t difficult, but there are a tonne of tricks you can use to really optimise your database. That’s what were going to be covering in this post.

I’m not going to be covering how to create clusters as I already covered that in part 5 of the series – Sharding. I’m not even going to discuss provisioning solutions like Puppet, Chef, Ansible etc. What I am going to discuss however, is a collection of system choices that need to be thought, about when creating your cluster. So let’s get started.

Continue reading

MongoDB – Part 8 – Backups

Far too often do backups get overlooked. In this blog, I’ll explain all the ways of backing up your MongoDB databases along with some general backup tips.

Before getting into the nitty gritty of backups, it’s important to remember that there are multiple MongoDB setups you could be running. For example, you could be running a standalone MongoDB instance, a replica set, or using shards. Regardless of which setup you are using, the backup process is practically identical.

One of the most important tips when creating backups, is to always use secondary nodes where possible. This will prevent your application from being inaccessible. If you to were backup a primary node, your application will be unable to receive writes, which is less than ideal to put it lightly.

Continue reading

MongoDB – Part 7 – MapReduce

MapReduce is a multi-step programming paradigm, which has been around for 2 decades. The goal of MapReduce is to break down a collection of data, sometimes from different machines on a network. Executing MapReduce jobs in parallel, over multiple machines, will significantly improve processing times.

The concept of MapReduce is inspired by the map and reduce functions in functional programming. When MapReduce was defined, it was as a 3 step process (Map, Shuffle, Reduce), MongoDB have taken this one step further and added a finaliser step. So lets see what’s involved in each step.

Continue reading