MongoDB – Part 5 – Sharding

Sharding, the art of scalability. That’s a bold statement, after all, there are two very important arts to scalability. What shards allow you to do however, is to scale out. What this means is, you no longer need to keep all of your data on one hard drive or machine, heck not even in the same warehouse or continent. You can have unlimited machines in unlimited geographic locations, connected together and serving data as if it was all the data was stored in one central location.

If you’re not already familiar with the terms scaling up/vertically and scaling out/horizontally, carry on reading. Otherwise you can skip the next few paragraphs.

Scaling Up/Vertically

Scaling up is a concept, which is purely hardware based, You don’t need to modify the way in which you use your software. All you have to do is beef up your machine e.g. Throw in a swanky new SSD, an octuple core CPU and some extra memory and you’ll be scaling up like a seasoned professional.

tl;dr: Scaling up means beefing up your existing nodes.

Scaling Out/Horizontally

Scaling out on the other hand, is a completely different ball game. Irrespective of what DBMS you are using, you have to make some changes to your architecture to achieve scaling out. Scaling out in MongoDB is known as sharding, what sharding allows you to do, is split chunks of data across an unlimited number of nodes, over a network.

tl;dr: Scaling out means adding more nodes to share the load.

Sharding 101

Up until now, in this crash course of MongoDB, all we have used is mongod. Before we can start using shards, we need to add two new mongo tools to our toolbox: mongos (mongo shard) and another type of mongod for config servers.

Until I get onto what exactly a shard is in its full glory. Just think of a shard as either a single node or a replica set which stores data. If you have one shard, it will contain 100% of the data set. If you have two shards, the data will be split across both, what data gets saved on which shards is customisable, so data won’t necessarily be split 50:50.

In the below section I’m going to detail the basics of starting a mongo shard, using 3 config servers and a single shard, the shard will consist of a replica set of 2 nodes. There are a few steps here, so lets break it all down:

Start 3 mongod config instances.
Start a mongos instance, and tell it where to find each config server.
Start 2 mongod nodes and create a replica from them.
Connect to the mongos and add the replica set as a shard.
Read that one more time, it’s important you understand it fully.

Step 1: Start Config servers

Config servers are used to store shard metadata, like what mongo shards exist, which locks are on which databases, which shards contain what chunks, which nodes contain what data, among other things.

Mongod config servers are standard mongod instances which are started with the –configsrv flag. Config servers by default run on port 27019 and their data resides in /data/configdb. These can both be changed with the –port and –dbpath flags respectively. You should have 3 config servers running on different machines in production environments at all times. On non-production environments, it’s safe to have all 3 mongod config servers running on the same machine. Whether or not the mongod config servers are on the the same machine or not, they must all be able to communicate with each other, so make sure your firewall is appropriately configured.

mongod --configsvr --dbpath /data/configdb1 --port 27019
mongod --configsvr --dbpath /data/configdb2 --port 27020
mongod --configsvr --dbpath /data/configdb3 --port 27021

Step 2: Start Mongos instances

Mongos instances as you probably guessed, stands for mongo shard. However that’s a little misleading. Your shards are the actual data nodes – the mongod data nodes. Mongos does not store any of your data, all it does is route requests to your data nodes, sort of like a load balancer, there is one huge gaping difference though. Load balancers attempt to evenly distribute read and write loads, mongos does not. Mongos will only distribute reads loads, to the nodes which contain the data you require. For writes it will send the request to the node where your configuration states the data should be stored. You can completely configure where different chunks of data should be stored by creating shard tags. I’ll cover both tags and chunks below.

When starting a mongos instance, a complete set of config servers should be provided. Lets assume all of the config servers and mongos instances are started on localhost. We would start a mongos by running:

mongos --configdb localhost:27019,localhost:27020,localhost:27021 --port 17022

Step 3: Start 2 Mongod nodes in a replica set

Please see my Mongo Replication article, to find out how to start a replica set. For the purpose of this article I’m going to assume both mongod instances are started on localhost using ports 27017 and 27018 and the replica set is named rs0 using the –replSet flag.

Step 4: Add replica set as a shard

The final step is to add the replica set you just created, as a shard. This is done by connecting to our mongos instance using the mongo binary and executing the sh.addShard() method. If you have multiple mongos instances, you only need to run the below on one of them. The change will automatically propagate via the config servers.

// connect to our mongos
mongo --host localhost --port 27022

// Add the two nodes from a replica set as a shard.
db.runCommand({ addShard: "rs0/localhost:27017, localhost:27018", name: "shard_name" });

As i mentioned above, you could add a single node as a shard too, by running:

db.runCommand({ addShard: "localhost:27017", name: "shard_name" });

The shard name is important, we’ll use this later.

Despite single node shards being possible, you’re better of using a replica set always, even if you only have one node in the set. Otherwise if you decide to use a replica set in the future, it’s going to be a pain in the ass.

Enabling sharding

You’re probably thinking, that after getting this far, surely sharding is now enabled. But you would be wrong. What you’ve actually done is merely created a shard and told it which replica set to use for CRUD. Shards can contain numerous databases, each with multiple collections. The next logical step is to enable sharding on the databases we want to shard. Connect to any of your mongos nodes and run the following:

sh.enableSharding("<database>")

Next you need to decide how you want to shard your collections. This is where the real magic happens, collections are the things which actually contain data after all. By sharding a collection, what you are saying is to split the data in that collection across multiple replica sets. Doing so allows you to scale out.

Sharding a collection, takes a little more thought than sharding a database. What I mean by this, is that you need to decide which fields you want to shard on, or split on if you will. You can shard by any number of fields and by any fields you want.

When choosing which field or fields to shard on, you want to make sure related documents are stored on the same shards. A example of this could be storing records by region. Assume you was building a small social network. It might make sense to store all users from Europe on one shard and all users from Asia on another shard. The benefit here would be that most users in Europe would have friends in Europe, so when getting a list of friends, most of the time only one shard would have to be queried. If more than one shard needs to be queried, this will happen for you automatically, however you should try to limit these queries.

You can think of shard key internals like indexes, after all for a shard key to exist, the same key must first exist as an index. If the index doesn’t already exist when a collection is sharded, the index will be created automatically.

// Shard on the region field on the user collections.
// So if the values were either Europe, Asia, N America, S America, Africa, Australia, Antarctica.
// This shard would make sure all users with the same region are stored on the same replica set
sh.shardCollection(“db.users”, { region: 1 })

// Shard on two columns: region and language.
// This would make sure all users with the same region and language are stored on the same replica set
sh.shardCollection(“db.user”, { region: 1, language: 1})

It’s always best to have shard keys with a high cardinality.

Shard Tags

In the above “Sharding 101” section, I mentioned tags on a few occasions, but I’ve not gone any further than that. What I did cover, was how to shard collections. What you can’t do when initially sharding a collection, is decide which documents should be saved in which replica sets. This is where tags come in.

Tags allow you to get around this by first assigning tags to your shards and then assigning tags to field values or ranges, mongos will use these tags to decide where to write and read data from.

That sounds bloody confusing I know, so here’s an example:

Lets assume you have a collection called profiles, which contains a field called rank. The lower the rank, the more popular the profile is and the more often it is viewed. Lets assume that the collection has already been sharded on the rank field. You have 2 shards one which uses spinning disks and one that uses SSD’s. It would make sense to always store the most popular records on SSD, so they can be fetched quicker. Lets do that:

// Add a tag called ssd to the shard called ssd_shard_name. 
// Remeber, the shard name was decided when the shard was created.
sh.addShardTag(“ssd_shard_name”, “ssd”)

// Add all records with a rank between 1 and 100 to any shard with an ssd tag
sh.addTagRange(“db.profiles”, { “rank”: 1 }, { “rank”: 100 }, “ssd”)

If tags are changed on a shard, documents which no longer match the tags will automatically be migrated to a shard that do. If no nodes support them, they will not move.

// If you want to make sure all documents are saved to a specific shard, you can use the MinKey and MaxKey values
sh.addTagRange(“db.profiles”, { “rank”: MinKey }, { “rank”: MaxKey }, “ssd”)

Chunks

What Are chunks

Chunks are another topic I’ve mentioned and they’re another very important aspect of sharding. Chunks are exactly what they say on the tin – a chunk of a shard. Lets go back to the example above, where we created a shard key on the user collection and the region and language fields:

sh.shardCollection(“db.user”, { region: 1, language: 1 });

I said above, that this shard key will mean every document with the same region and language will reside on the same replica set. That is true, but we can actually go a little more granular. What it actually confirms, is the documents will reside on the same chunk.

Generally each shard key should have a very high cardinality or even better, be completely unique. Chunks don’t just contain identical values though, they group data by ranges (numerically or alphabetically). Lets assume you have 2 regions and 10 languages. That means there are 20 different permutations of shard key and therefore a maximum of 20 chunks can be created. That isn’t really ideal, as I mentioned above. Share keys should have a high cardinality or be unique. With only 20 possible permutations of shard key, there is the potential for chunks to grow huge because they cannot be broken down any further. This will make it difficult to scale out.

Also just because there is a maximum of 20 chunks, it doesn’t mean there will be 20 chunks. In fact to begin with there will be only 1 chunk and as you data set grows, your mongos daemons will try to split the chunks to make them smaller. If shard keys are unique or have a high cardinality, it means the mongos daemons will easily be able to break down chunks.

How are chunks created?

As soon as you start using shards, you no longer send reads and writes to individual nodes. Instead you connect to a mongos and the mongos sends the request to the correct shards. When you send the first write to a collection on each shard, the mongos will create a brand new chunk. All further writes will then be part of that chunk. Each mongos will individually keep track of how big chunks are getting on each shard. After receiving a certain number of writes, the mongos will ask the mongod to try and split the chunks into smaller chunks. The mongod will return a bunch of split points to the mongos and the mongos will then send the new chunk ranges to the config servers for saving. No data is moved at this point just chunks are broken down, so you have smaller and more manageable sets of data.

Chunks are automatically evenly distributed across all of your shards, unless you have tags setup to store them differently. If you remove shard tags, the mongos nodes will slowly try to move chunks between shards to evenly distribute them.

Chunks and config servers?

When a chunk is split into multiple smaller chunks, data isn’t actually moved. All that happens, is the config servers which store the different chunk ranges updates itself, to represent the new chunk ranges.

For a chunk to be split successfully, all config servers must be online and accessible. If a config server isn’t available, a split will be attempted after each write request to a mongos which has reached its threshold.

The amount of writes a mongos has to receives, before it asks the mongods to provide split points will vary. It depends mainly on the size of your collections. Unfortunately this value isn’t customisable.

Balancing rounds

A balancing round is when a mongos checks its threshold to see if it needs to ask the shards for split points in its chunks. Any mongos can run a balancing round, but only one mongos can run it at a time.

When a mongos becomes the balancer, it will create a balancer lock on the config database. The lock will stop other mongos nodes from becoming the balancer until the lock is freed. The lock is created on the config servers db.locks collection.

The balancer not only breaks down chunks, but can also move chunks between shards. Chunks will remain on the original shard and will continue to be served from the original shard until the chunk has been fully migrated. Once migrated mongos may try to route reads to the old shard for a while, this will silently fail and automatically route the request to the correct shard.

Jumbo chunks

If on one of your config servers you run:

db.settings.find({ _id: "chunksize" });

You’ll get a number, which is the maximum size of a chunk in megabytes, once a chunk exceed this size, it can no longer be split or moved without manual processes. These chunks are called jumbo chunks. A good chunk size is no bigger than 64MB. Having reasonably large chunks, will prevent your mongos nodes from constant splitting and moving of chunks around. If you decrease the chunk size. The chunks will not be split straight away. However when they are split they will trend towards the new size.

Splitting and moving jumbo chunks

You cannot split chunks which are greater than the max chunk size, without temporarily increasing max chunk size, or manually splitting a chunk using sh.splitAt() e.g.

sh.splitAt("db.users", {"user_id": 1234})

This will split the chunk containing the user with the user_id of 1234.

Jumbo chunks cannot be moved without increasing the chunk size, turning of the balancer and manually moving them.

// Turn of the balancer for the users collection
sh.disableBalancing(‘users’)

// Move the chunk containing the user with the user_id 1234, to the shard named shard2.
sh.moveChunk("db.users", { user_id: 1234 }, "shard2")

Shard Keys

When choose your shard keys, you want to carefully think about who is going to be primarily accessing the data and always trying to minimise the amount of shards required to perform an operation. Here are a couple of shard key types you could use:

Ascending

These use a field like _id that constantly gets larger. This means one shard gets all initial writes and are then chunked and moved, which is less than ideal in most situations, unless you’re attempting the firehose strategy.

Randomly distributed

These use fields, where data is inserted in no particular order e.g. email address, name, age. Randomly distributed fields are evenly(ish) distributed. MongoDB claims to not be as efficient accessing randomly distributed chunks. These are a very common choice.

Location based

These use a field like location or ip. This helps to keep records from similar regions, in the same chunks.

Final Tips

Shard keys cannot contain arrays as each value in an array is individually indexed and so the values could reside on different shards and this can’t be stored in config.
Chunks are not physically grouped on disk just in metadata.
Shard keys cannot be geospatial indexes.
Shard keys should always have a high cardinality, this prevents jumbo junks. Smaller chunks are easier to move around and improves load balancing. A good way of increasing cardinality is using compound keys.
Once inserted, to change a shard key, the documents must be dropped and re-added.
There is a strategy called the firehose strategy, this is the process of sending all writes to a powerful shard, then moving the chunk afterwards.

Conclusion

Three cheers, you made it. That was a lot to take in. You probably have a lot of questions and a lot of uncertainties about what you’ve just read. As always I’m more than happy to answer those question. Alternatively the MongoDB site will be able to expand further on all of the above. Thanks for reading.

9 Love This

Simon Jakowicz

Just another blogger