MongoDB – Part 9 – Deployments

Welcome back. Deploying MongoDB isn’t difficult, but there are a tonne of tricks you can use to really optimise your database. That’s what were going to be covering in this post.

I’m not going to be covering how to create clusters as I already covered that in part 5 of the series – Sharding. I’m not even going to discuss provisioning solutions like Puppet, Chef, Ansible etc. What I am going to discuss however, is a collection of system choices that need to be thought, about when creating your cluster. So let’s get started.

Continue reading

MongoDB – Part 8 – Backups

Far too often do backups get overlooked. In this blog, I’ll explain all the ways of backing up your MongoDB databases along with some general backup tips.

Before getting into the nitty gritty of backups, it’s important to remember that there are multiple MongoDB setups you could be running. For example, you could be running a standalone MongoDB instance, a replica set, or using shards. Regardless of which setup you are using, the backup process is practically identical.

One of the most important tips when creating backups, is to always use secondary nodes where possible. This will prevent your application from being inaccessible. If you to were backup a primary node, your application will be unable to receive writes, which is less than ideal to put it lightly.

Continue reading

MongoDB – Part 7 – MapReduce

MapReduce is a multi-step programming paradigm, which has been around for 2 decades. The goal of MapReduce is to break down a collection of data, sometimes from different machines on a network. Executing MapReduce jobs in parallel, over multiple machines, will significantly improve processing times.

The concept of MapReduce is inspired by the map and reduce functions in functional programming. When MapReduce was defined, it was as a 3 step process (Map, Shuffle, Reduce), MongoDB have taken this one step further and added a finaliser step. So lets see what’s involved in each step.

Continue reading

MongoDB – Part 6 – GridFS

If you could have a wild guess at what GridFS is used for, you’d probably say some kind of file system and you wouldn’t be completely wrong. On the contrary, GridFS isn’t actually a filesystem, but rather a convention for storing large binary data files inside MongoDB.

I haven’t covered storing binary data in MongoDB yet, however it’s possible to store binary data in standard documents, without using GridFS at all. This is possible using the BSON type BinData. It’s not very well supported by the mongo client, however most language drivers have good support, just google “MongoDB BinData”. This is all well and good, in fact if you’re storing under 16MB of data per file, it’s recommended to use this approach. If however you are storing binary files larger than 16MB, GridFS is the way to go. 16MB is the maximum size of a document in MongoDB.

Continue reading