MongoDB – Part 1 – Collections, CRUD, Modifiers, Commands

Recently I’ve been posting a lot of articles about database and more specifically NoSQL. I’ve given a quick overview of each of the NoSQL database types as well as an overview of common concepts that appear throughout most NoSQL databases. Now I want to focus on a specific NoSQL database and I have Chosen MongoDB.

MongoDB is the most popular NoSQL database out there right now and you don’t have to take my word for it. Looking at Google Trends, you can see that MongoDB is getting more searches than Redis, Neo4j and Cassandra combined. Each of these databases are leading the way for their own database model (key-value, document, column and graph). So if you plan on learning a new database, MongoDB is without a doubt a fantastic choice. I’ve personally really enjoyed working with it the last 3 months and can see it being a popular choice for a very long time.

So what’s the new series all about!

Aside from the obvious. This series is going to contain 9 parts (I think) and those parts are going to be based on CRUD, databases and collections, indexes, map reduce, replication, sharding, backups, tips and tricks and deployment. I’m not going to be walking thought the absolute basics of MongoDB, as this is my no means a replacement for a book on MongoDB. So a very basic understanding of MongoDB would be useful, like how to start a Mongo daemon and connect to a mongo shell. But I’ll be skimming these topics for those who have no MongoDB experience. Learning MongoDB is no 5 minute task, it’s not even a 10-20 hour task, NoSQL databases require a different mindset and to get into that mindset, you need to do a lot of reading and get a lot of practice. Taking that into account, I hope to still provide enough information for a novice MongoDB user to follow along.

From here on out, I’m going to be listing a lot of commands. I’ll give an overview at the top of each section, giving an high level overview of what the section is covering. Inside the code blocks I’ll also be writing comments above each command, giving additional information that you might find useful.

Daemons and Databases

When using a relational database the first thing you need to do is create a database. The same applies for MongoDB. Just like any service on your machine, before you can connect to it, you must first start its daemon. Once you have MongoDB installed, that’s as simple as running:

// There are tonnes of arguments you can pass to 'mongod'.
// Some of which I'll cover in later articles.
// Check out: http://docs.mongodb.org/manual/reference/program/mongod/
// Make sure mongod is in your $PATH
mongod

Now you have the Mongo daemon running. You can connect to a Mongo shell and then create the database.

// This will connect you to you mongo shell
mongo

// In MySQL you can only call use on a database which already exists.
// In MongoDB, if the database doesn't exist, it will be created for you.
use newDbName;

Collections

You can think of collections, like you think of a table in a relation database. For the below commands to work, you must first have run the “use dbName” command above.

// db is an object which refers to the current database that you are using.
// Create a collection called testCollection
db.createCollection("testCollection")

// Create a capped collection called testCollection. Cap at 100,000 bytes or 100 documents.
// Capped collections will be covered in details in a later article.
// For now just think of them as collections with limited storage and always discard the oldest records when new documents are saved.
db.createCollection("testCollection", {"capped":true, "size":100000, "max":100 })

CRUD

Pointing out the obvious here, but the below covers how to create, read, update and delete documents. You can think of documents like a row in a table. Just a row in a relational db has to follow the schema of the table, a document is completely schemaless and can store absolutely anything. Again, below db refers to the current database being used and people refers to the name of the collection you want to perform the operation on. If you try to insert a document into a collection which doesn’t exist, MongoDB will automatically create the collection for you.

// Insert a new document into the people collection
// ObjectId() will generate a Id which is unique for this unique for this collection. I've done this for demonstration purposes, but it's actually the default behaviour. 
// Date() will generate the current datetime in the the ISO 8601 standard.
db.people.insert({ _id: ObjectId(), name: 'bob', date: new Date() })

// Insert multiple new documents into the people collection
db.people.batchInsert([{ name: 'bob' }, { name: 'steve'  }, { name: 'craig' }]) 

// Update a single document, where the document matches the first parameters criteria and replace the document with the second parameters data.
db.people.update({ name: 'bob' }, { new: 'data' })

// Update all documents, where the documents match the first parameter criteria and replace the documents with the second parameters data.
db.people.update({ name: 'bob' }, { new: 'data' },  { multi: true })

// Upsert a document using the upsert key in the 3rd parameter
// Upsert means update a document if one was found, otherwise create a new document.
db.people.update({ new: 'data' }, { _id: objectId(4), new: “data” }, { upsert: true })           

// Upsert a document using save()
// A document lookup will always be done on the _id value provided.
db.people.save({ _id: ObjectId('5'), new: “data” })

// Remove a document
db.people.remove({ _id: objectId(5) })

// Truncate a collection
db.people.remove()

// Drop a collection
db.people.drop()

// Rename a collection
// This can take a few seconds
db.testCollection.renameCollection("newTestCollection")

Select Queries

Not really select queries, as such. But you can use the find method to pull data from a collection.

// Find a user by _id from the users collection
db.users.findOne({ _id: objectId(1) })

// Select all users
db.users.find()

// Find user by name
db.users.find({ name: 'bob' })

// Select all users and return only the name and email values
db.users.find({}, { name: 1, email: 1 })
                            
// Select all users and return all fields apart from pointless_data value
db.users.find({}, { pointless_data: 0 })

// Find documents with an age value between 10 and 30
db.users.find({ age: { $gt: 10, $lt: 30 } })

// Find users by registration date
db.users.find({ register_date: { $lt: new Date(‘01/01/2012’)) } })

// Find users with a name of simon or steve
db.users.find({ name: { $in: [“simon”, “steve”] })

// Find users with whose name isn’t bob or frank
db.users.find({ name: { $nin: [“bob”, “frank”] })

// Find people with the name simon or who are a ninja
db.users.find({ $or: [ { name: “simon”, }, { ninja: “yes” } ])

// Find users without the ninja field
db.users.find({ ninja: null })

// Find documents, that have a fruits array containing pear and orange.
db.fruits.find({ fruits: { $all: [“pear”, “orange”] }

// Find documents with 3 records in the fruits array
db.fruits.find({ fruits: { $size: 3 })

// Find a users name.first value of simon and a name.last value of jakowicz
db.users.find({ name : { first: “simon”, last: “jakowicz” } })

// Explain a query, this is similar to MySQL and is useful to get optimisation tips e.g. information on indexes.
db.users.find({ name: “simon” }).explain();

// Hint which index to use, in this case the country and favourite colour compound index.
// This is useful to guarantee that MongoDB uses the index which is best suited to your search. More often than not MongoDB will automatically select the correct index.
db.coll.find({ name: “simon” }).hint({ country: 1, favourite_colour: 1 });

// Sort in insertion order, use -1 for reverse insertion order.
// Note that this order changes as documents move due to lack of document padding. I'll cover document padding in a later article
db.users.find({ name: “simon” }).sort({ $natural: 1 });

// Display pretty output
db.coll.find({ name: “simon” }).pretty();

// Used to make sure each document is only returned once, even if they are moved due to lack of padding.
// If you are reading a large quantity of documents, using snapshots is useful to make sure that no rows are read more than once.
// You may be thinking, "why would a document be returned twice?" This happens when, mid query a document moves position, due to using all of it's padding (allocated space)
db.coll.find().snapshot()

Query Tip

$in is always faster than $or, because $or runs multiple queries. However this also means $or can also use multiple indexes.

Database Commands

Not all commands need to be run on a specific collection. Below I’m going to cover a bunch of commands you can run on a database to find details on existing operations and locks, discover errors, run repairs and more.

// Get data on if the last operation was successful or not
db.runCommand({ getLastError: 1 })

// Convert a collection to a capped collection with 10,000 bytes of storage
// Collections which are sharded cannot be converted to capped.
// Capped collections, are normal collections which have a max size. When the max is reach the oldest documents will be deleted as new ones are added.
db.runCommand({ "convertToCapped" : "collectionName", "size" : 10000 });

// Defragment your collection so everything is in order and quicker to read. Previous disk space which was used cannot be reclaimed, it will just be allocated for future use.
db.runCommand({ “compact”: “coll”, “paddingFactor”: 1.5 })

// Running a repair on a database will reclaim disk space.
// Nodes should be put into standalone mode before a repair is run, this will stop requests being sent to that node.
// When starting mongod, a repair can also be done using the --repair flag.
// This can be a very slow process
db.repairDatabase()

// Shutdown a mongod instance, this must be run on the admin database
use admin;
db.shutdownServer();

// Get a list of current operations on a database
db.currentOp()

// Enable slow query profiling
// First Param = Profile Level: 0=Off, 1=Ops slower than ms provided, 2=All
// Second Param = MS: Milliseconds before an op is considered slow and should be profiled.
// Logs are profiled to the admin.profile collection inside the database being used.
// Incurs a heavy performance penalty, as writes have to be written twice.
// Non persistent - On restart, profiling will be disabled.
db.setProfilingLevel(1, 100)

// Rotate the logs, usually called as part of a cron.
db.adminCommand({"logRotate" : 1})

// Get information on profiling settings
db.getProfilingLevel()

// Get size of a document in bytes
Object.bsonsize(db.coll.findOne())

// Get the size of a collection.
// By default in bytes, pass 1024 as the divideBy param to get KB. 
db.coll.stats(divideBy)

// Get the size of a database
db.stats(divideBy)

Modifiers

Modifiers are a pretty cool feature inside MongoDb, they allow you to edit a documents values, based on what it’s current value is. For example adding values to the end of an array, incrementing and decrementing values, making sure values in an array are unique, finding and modify a document in a single command and way more. Here are some I find most useful.

// Increment counter field by 1
db.coll.update({ _id: ObjectId('123') }, { $inc: { counter: 1 } });

// Set a single new key=>value pair into the document, as the array ['Bacon', 'Egg']
db.coll.update({ _id: ObjectId('123') }, { $set: { gift: ['Bacon', 'Egg'] } });

// Remove the gift field from the document
db.coll.update({ _id: ObjectId('123') }, { $unset: { gift: 1 } });

// Remove the value Cake from the gift array.
db.coll.update({ _id: ObjectId('123') }, { $pull: { gift: 'Cake' } });

// Push the value Pizza to the end of the gift array
db.coll.update({ _id: ObjectId('123') }, { $push: { gift: 'Pizza' } });

// Push each value provided to the end of the gift array
db.coll.update({ _id: ObjectId('123') }, { $push: { gift: { $each: ['Cake', 'Beer'] } } });

// Same as above but only keep the end 5 values
db.coll.update({ _id: ObjectId('123') }, { $push: { gift: { $each: ['Cake', 'Beer'], $slice: -5 } } });

// Push new values and then sort them
db.coll.update({ _id: ObjectId('123') }, { $push: { gift: { $each: ['Cake', 'Beer'], $sort: { gift: -1 } } } });

// Push Simon to the authors array, only if it’s not already there
db.coll.update({ _id: ObjectId('123') }, { $addToSet: { authors: 'Simon' } });

// Push Simon, Steve and Stu to the authors array, only if they’re not already there
db.coll.update({ _id: ObjectId('123') }, { $addToSet: { authors: { $each:  ['Simon', 'Steve', 'Stu'] } } });

// Increment each comments vote by 1
db.coll.update({ name: 'Simon' }, { $inc: { comments.$.vote: 1 } })

// Perform and upsert, but only set the created value on an insert.
db.coll.update({ id: ObjectId('123') }, { $setOnInsert: { created: new Date() } }, true)

// Update all documents that have a name of bob.
// If 4th param is true, update all.
db.coll.update({ name: 'bob' }, { $set: { name: 'frank' } }, false, true)

// Make sure an update is replicated to the majority of members before considered as successful.
// This will be covered in more detail in the replication article.
db.coll.update({ name: 'Simon' }, { name: 'psymon' }, { writeConcern: { w: 'majority', wtimeout: 5000 } } })

// Update a document and return the updated document
// If new: false was provided, the document would be updated and the old document returned
db.coll.findAndModify({ query: { name: 'bob' }, update: { $set: { name: 'frank' } }, new: true });

// Find and remove a document
db.coll.findAndModify({ query: { name: 'bob' }, remove: true })

Mongo Unix Commands

MongoDB like most software these day, embraces the unix philosophy, that a tool should one thing and do it well. What this means is, when you install Mongo, you don’t just get the Mongo daemon (mongod) and the Mongo shell (mongo). You get a whole assortment of Mongo goodies like mongodump, mongoimport, mongorestore, mongos, mongotop, mongostat and a bunch more. I’m going to cover most of these throughout the series, but for now, here are a few examples of mongotop and mongostat.

// Run to see busy operations, like the unix top command
mongotop

// Get information on mongo locks
mongotop --locks

// Provide a bunch of aggregate data every second about what mongo has done.
mongostat

// Provide a bunch of aggregate data about each member in a cluster.
mongostat --discover

Conclusion

That’s it for part one and a tell of a lot to take in. For now you really don’t need to remember any of this, know what you can and can’t do is enough. Like anything the more you use these tools, the more you’ll discover what you like and find useful and you’ll naturally remember the good parts. Saying that, the examples in this blog, do give you enough knowledge to go away with and start using Mongo in your real world apps. There are drivers for just about every programming language and awesomely they all follow a very similar syntax to the one used by the mongo shell, so you shouldn’t have any problems adjusting these example to your favourite language.

As always, thanks for reading.

4 Love This

Simon Jakowicz

Just another blogger