NoSQL Essentials – Part 2 – Key-value databases

Key-value Databases

Key-value databases are by far the most basic of the NoSQL data stores. All major key-value databases, store their data in-memory, or at least provide the option to store in-memory or on disk. Some of the most popular key-value databases currently includes, Redis, Riak, Memcached, BerkleyDb, HamsterDB, Amazon Dynamo (This is part of the AWS suite and is proprietary to them).

In this article I am going to be focusing on Redis and Riak.

Key-value databases have been around for a long time. Memcached being one of the oldest and most widely used, however far from being the most feature rich over the last 5 years and is also significantly less performant than its competitors.

Differences from relational databases.

Columns in relational database would normally store a scalar values. However the “value” part of a key-value store, can and usually does contain entire entities known as aggregates.
Key-value databases can only query for a record based on their key (the key is equivalent to a primary id, in a relational database).
When selecting data from a key-value store. The entire value has to be returned. This makes optimistic writes extremely tricky because write-write conflicts are difficult to detect.

Useful tips

Riak offers a useful tool called buckets, these are essentially tables. By using different buckets, you will be able to reuse your keys between buckets. Redis offers similar function in the form of hashes. Instagram wrote about, how using hashes enabled them to cut down 21GB of data to just 5GB.
Most key-value stores including Riak and Redis both implement the eventual consistency model. Meaning data may not always be available across nodes. This isn’t necessarily bad, this level of consistency fits in nicely with most key-value database use case anyway.
Riak and Redis both offer simple ways of implementing read and write quorums, allowing you to tailor your database to your application requirements.

Advantages

Very easy to scale – Riak and Redis both offer very convenient ways of managing nodes.
Redis offers a large range of data storage types including lists, sets and hashes. It’s also possible to run range, diff, union and intersection operations on internal data.
Riak also offers a HTTP based interface for browser based operations or by using cURL.
Replication is also simple with most key-value stores including Riak and Redis. Read and write quorums are easy to configure allowing you control all aspects of the CAP theorem.
Riak has a fantastic set of libraries, allowing you to interact with Riak using pretty much any language you would like to, in a sane mind. Redis relies more on 3rd party libraries, but also has great support.

Disadvantages

No key-values databases support ACID transactions in the typical sense of a relational database. However Redis does provide a type of transaction, using the MULTI command. The MULTI command will defend your application against programming errors. e.g. A typo in one of the Redis commands. However these types of errors are likely going to be found in a development environment anyway. Using the MULTI command will NOT enable you to roll-back changes due to a badly written query e.g. updating a key which doesn’t exist.
Sharding isn’t widely supported by key-value stores yet. Redis has its own name for them called partitions which are currently only in the beta stage. Twitter are also on the scene with their own proxy for Redis called Twemproxy. Riak also offers an alternative to shards called consistent hashing, which is a technique to evenly distribute data on a cluster. In Riaks case this is known as the ring

Use Cases

Storing sessions – This is by far the most common use case I have come across. Session data is generally read on every request, so this will save a lot of requests to your main data store or filesystem.
Shopping carts – Shopping cart items are usually temporary so storing them in memory is a great idea. Depending on your system requirements, you could set cart data to expire after a certain amount of time. You could even first write the contents to a more permanent storage beforehand allowing a user to continue from where they left off.
Anything with heavy read traffic. Lets say we have want to show a collection of statistics to anyone who visits a web page and we have over a million viewers per minute. Instead of running numerous database queries per page load. We could cache the results in memory every minute and then fetch the data from memory for each request. Now that is going to save you some tin.

Conclusion

That just about covers key-value databases. Just to sum it all up in a single sentence, I would say.

Key-value databases are lightweight, schema-less, relationship-less and transaction-less data store, primarily storing temporary data in memory.

3 Love This

Simon Jakowicz

Just another blogger