Welcome back. Deploying MongoDB isn’t difficult, but there are a tonne of tricks you can use to really optimise your database. That’s what were going to be covering in this post.
I’m not going to be covering how to create clusters as I already covered that in part 5 of the series – Sharding. I’m not even going to discuss provisioning solutions like Puppet, Chef, Ansible etc. What I am going to discuss however, is a collection of system choices that need to be thought, about when creating your cluster. So let’s get started.
If your budget stretches to affording SSD’s, it’s a great investment. The hard disk is the main bottleneck in any MongoDB setup. Using SSD’s or large amounts of memory will dramatically improve IO Wait, Lock times and CPU power usage.
MongoDB doesn’t require much CPU power, apart from when building indexes and occasionally when using MapReduce. A dual core CPU can easily handle 10k queries per second.
MongoDB should really only be deployed on a Linux distribution. As much as MongoDb is supported on Windows and OSX, you’re asking for trouble using either in production.
Make sure you use a 64 bit OS always, otherwise your data directory will be limited to 2GB. Mongo 3, which was released about 6 months ago, is only supported on 64 bit systems.
Mongod instances can be split across multiple different OS’s, every OS uses the same wire protocol and stores data in the same manner.
MongoDB recommend running on a little-endian architecture. Basically all modern CPU’s are little-endian or bi-endian, which means they support little-endian and big-endian.
There are a bunch different RAID setups you could use to shard and replicate your data. These aren’t really necessary as MongoDB provides a far more customisable solutions to both sharding and replication. But here are a couple of options that you might want to consider.
- RAID0 – Data is split across multiple hard disks, this will result in data loss if a hard disk is to crash.
- RAID1 – Data is replicated across all hard disks on the RAID. If a hard disk crashes, there will be no data loss.
- RAID5 – All data is replicated onto at least 2 drives, so any drive can crash and data will still be available.
- RAID10 – This is a mix of using both RAID0 and RAID1. Originally with a RAID0 to shard files and RAID1 to replicate each RAID0.
MongoDB doesn’t normally require swap space, but it can be useful when building indexes and running repairs. Otherwise MongoDB may get killed by the OOM killer.
Here are some command you can use to create and start using a swap file.
# create a directory that will hold the swap file sudo mkdir -p /var/cache/swap/ # for 4GB storage, 1MB at a time sudo dd if=/dev/zero of=/var/cache/swap/myswap bs=1M count=4096 # Only root can access sudo chmod 0600 /var/cache/swap/myswap # Make the swap file sudo mkswap /var/cache/swap/myswap # Announce to system sudo swapon /var/cache/swap/myswap # Insert the following line in /etc/fstab for swap from the next boot: /var/cache/swap/myswap none swap sw 0 0 # Turn off swap space if need be sudo swapoff /var/cache/swap/myswap
If possible, ext4 or XFS should be used as they allow for filesystem snapshots. ext3 is not recommended, because MongoDB regularly creates zero-fill data files, which is slow and can freeze for minutes.
This happens when a process requests more memory than is available. If overcommitting is enabled the kernel will promise the memory in the hope that it’ll become available. This doesn’t work well with MongoDB, to disable overcommitting run
echo 2 > /proc/sys/vm/overcommit_memory
Turn off NUMA (Non Uniform Memory Architecture)
NUMA is the term meaning each CPU has its own allocated memory. This is useful for day to day work, but in a MongoDB environment, it can result in your memory not being used effectively. NUMA should be disabled in your bios and MongoDB restarted.
# Updating Your bios (grub.cfg) kernel /boot/vmlinuz-2.6.38-8-generic root=/dev/sda ro quiet numa=off # Starting mongod with the numactl command numactl --interleave=all mongod -f config.conf
Update your readahead
Read ahead is how much extra data should read into memory which wasn’t asked for. This can be useful for gathering data which might be required soon when you have large documents. However if you have small documents, you would probably want a smaller readahead. This will stop data being read into memory, which isn’t needed.
# To get a report of your readahead for each drive # The RA column is your readahead and is measure in 512 bytes sectors. # So a value of 256, means 256 * 512 bytes = 128kb readahead sudo blockdev --report # Update the readahead for /de/sdb3 to 16 * 512 bytes = 8kb sudo blockdev --setra 16 /dev/sdb3
Readahead should not be set below 16 sectors (8KB), as this will make fetching indexes less efficient. Furthermore, it’s not recommended to set the readahead above 256 sectors.
Hugepages should be disabled. It designed for systems that do a lot of sequential reading, like relational databases. Hugepages can cause similar issues to a high readahead.
The maximum number of file descriptors is configured to 1024 by default. This should be increased to unlimited or at least increased to around 20,000. Each incoming and outgoing request will use a file descriptor. So a file descriptor limit of 20,000, will allow roughly 10,000 concurrent connections.
# Unlimited file descriptors ulimit -m # 20,000 file descriptors ulimit -n 20000
Make sure each MongoDB node, can only be accessed by nodes that require them.
- Clients can connect to each mongos
- Mongos can connect to all mongod data nodes & config nodes
- Mongod data nodes can connect to other mongod data nodes & config nodes
- Config nodes don’t need to connect to any MongoDB nodes
Make sure clocks are in sync
If the clocks on your shards are not in sync you can have issues with replication. MongoDB can deal with slight discrepancies (up to a few minutes). Clocks can be kept in sync by using the ntp daemon.
Turn of periodic updates
Turn off updates from package managers which can slow down the system unexpectedly. This should be managed with a proper configuration management tool anyway.1 Loves This