NoSQL Essentials – Part 4 – Column-family databases

Column-Family Databases

Column-family databases aren’t a million miles different from document databases. They hold most of the same pros and cons. There is however one significant difference, column-family stores use peer-to-peer replication, where document databases, use master-slave or primary-secondary replication.

Just like all NoSQL database types, you have a lot of databases choices. Common column-family data stores include HBase, Cassandra, Amazon SimpleDB and Hypertable. HBase was built by Apache and is debatably the most popular column-family store. However recently Cassandra has been getting a lot of attention, so that is what I will be covering.

Differences from relational databases.

  1. In column-family databases, each row consist of a collection of columns=>value pairs. A collection of similar rows then makes up a column family. In a relational databases, this would be equivalent to a collection of rows making up a table. The main difference is that in a column-family database, rows do not have to contain the same columns.
  2. RDBMS impose high cost of schema change for low cost of query change. Column families impose little to no cost in schema change, for slightly more cost in query change.

Continue reading