MongoDB – Part 6 – GridFS

If you could have a wild guess at what GridFS is used for, you’d probably say some kind of file system and you wouldn’t be completely wrong. On the contrary, GridFS isn’t actually a filesystem, but rather a convention for storing large binary data files inside MongoDB.

I haven’t covered storing binary data in MongoDB yet, however it’s possible to store binary data in standard documents, without using GridFS at all. This is possible using the BSON type BinData. It’s not very well supported by the mongo client, however most language drivers have good support, just google “MongoDB BinData”. This is all well and good, in fact if you’re storing under 16MB of data per file, it’s recommended to use this approach. If however you are storing binary files larger than 16MB, GridFS is the way to go. 16MB is the maximum size of a document in MongoDB.

GridFS Intro

By using GridFS, your binary files will automatically be broken up into smaller chunks which can be examined individually. This means there will be 2 collections, one to store the actual chunks and one to store the metadata which states which chunks relate to one another. These collections are named fs.chunks and fs.files respectively.

Sharding

In most situations, you’re probably not going to need to shard your binary files. However if you do, you’re going to want to shard using the files_id field on the fs.chunks collection. Doing so will make sure all chunks of the same file are stored on the same shard.

Be carful not to mix up shard chunks and GridFS chunks. Shard chunks are small collections of documents which all reside on a single shard. GridFS chunks are basically a single chunk of binary data that makes up part of a larger file.

Replication

You might be thinking, how does replication fit into this new data type. Well, actually it works just like any other data you have stored in documents. As long as you have nodes connected via a replica set, MongoDB will automatically replicate your binary data, just like it replicate everything else.

Mongofiles

In part one, I wrote about how MongoDB follows the unix philosophy of creating small tools that do one thing and do it well. Well on that note, when you install MongoDB, another tool you’ll get is called mongofiles. This tool is super simple to use and allows you to easily copy binary files into your database. Mongofiles generally takes most of the flags that the mongod tool uses, for example using flags like –hostname to connect to a specific host, or –port to connect to a specific port. You must also provide a -d flag which tells mongofiles the database you want to store these binary files in.

Here are some example mongofiles command

// Upload the foo.txt file to the reporting database
mongofiles -d reporting put foo.txt

// List all files in the reporting database
mongofiles list

// Download the foo.txt file from the reporting database
mongofiles get foo.txt

// Delete the foo.txt file from the reporting database
mongofiles delete foo.txt

Conclusion

It feels good to have such a short but sweet MongoDB blog, but that pretty much covers what you need to know. Make sure you can differentiate between shard and GridFS chunks as they are very different things. Most languages that you would dream of programming in, have support for using GridFS too, making it really easy to get started with. Until next time, where I’ll be discussing all things Map Reduce.

2 Love This

Leave a Reply

Your email address will not be published.