MongoDB – Part 8 – Backups

Far too often do backups get overlooked. In this blog, I’ll explain all the ways of backing up your MongoDB databases along with some general backup tips.

Before getting into the nitty gritty of backups, it’s important to remember that there are multiple MongoDB setups you could be running. For example, you could be running a standalone MongoDB instance, a replica set, or using shards. Regardless of which setup you are using, the backup process is practically identical.

One of the most important tips when creating backups, is to always use secondary nodes where possible. This will prevent your application from being inaccessible. If you to were backup a primary node, your application will be unable to receive writes, which is less than ideal to put it lightly.

Continue reading

MongoDB – Part 7 – MapReduce

MapReduce is a multi-step programming paradigm, which has been around for 2 decades. The goal of MapReduce is to break down a collection of data, sometimes from different machines on a network. Executing MapReduce jobs in parallel, over multiple machines, will significantly improve processing times.

The concept of MapReduce is inspired by the map and reduce functions in functional programming. When MapReduce was defined, it was as a 3 step process (Map, Shuffle, Reduce), MongoDB have taken this one step further and added a finaliser step. So lets see what’s involved in each step.

Continue reading

MongoDB – Part 6 – GridFS

If you could have a wild guess at what GridFS is used for, you’d probably say some kind of file system and you wouldn’t be completely wrong. On the contrary, GridFS isn’t actually a filesystem, but rather a convention for storing large binary data files inside MongoDB.

I haven’t covered storing binary data in MongoDB yet, however it’s possible to store binary data in standard documents, without using GridFS at all. This is possible using the BSON type BinData. It’s not very well supported by the mongo client, however most language drivers have good support, just google “MongoDB BinData”. This is all well and good, in fact if you’re storing under 16MB of data per file, it’s recommended to use this approach. If however you are storing binary files larger than 16MB, GridFS is the way to go. 16MB is the maximum size of a document in MongoDB.

Continue reading

MongoDB – Part 5 – Sharding

Sharding, the art of scalability. That’s a bold statement, after all, there are two very important arts to scalability. What shards allow you to do however, is to scale out. What this means is, you no longer need to keep all of your data on one hard drive or machine, heck not even in the same warehouse or continent. You can have unlimited machines in unlimited geographic locations, connected together and serving data as if it was all the data was stored in one central location.

If you’re not already familiar with the terms scaling up/vertically and scaling out/horizontally, carry on reading. Otherwise you can skip the next few paragraphs.

Continue reading