While playing with CouchDB, I decided to expand on the subject and research to see what else is out there. Apparently there are lots of cool implementations of Key Value Stores. And since to try them all will take a long time, I decided make some notes of what I found, to simplify my journey.
Here I created a simple reference list of different, non-commercial implementations of Key Value Stores. Let me know if there are more interesting projects that are not in this list, so we can keep this list updated.
A C library that implements a very fast and space efficient key-value store:
The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Besides bindings for Ruby, there are also APIs for Perl, Java, and Lua available.
To share Tokyo Cabinet across machines, Tokyo Tyrant provides a server for concurrent and remote connections.
Speed and efficiency are two consistent themes for Tokyo Cabinet. Benchmarks show that it only takes 0.7 seconds to store 1 million records in the regular hash table and 1.6 seconds for the B-Tree engine. To achieve this, the overhead per record is kept at as low as possible, ranging between 5 and 20 bytes: 5 bytes for B-Tree, 16-20 bytes for the Hash-table engine. And if small overhead is not enough, Tokyo Cabinet also has native support for Lempel-Ziv or BWT compression algorithms, which can reduce your database to ~25% of it’s size (typical text compression rate). Oh, and did I mention that it is thread safe (uses pthreads) and offers row-level locking?
good intro: Tokyo Cabinet Beyond Key Value Store
Voldemort is a very cool project that comes out of LinkedIn. They seem to even be providing a full time guy doing development and support via a mailing list. Kudos to them, because Voldemort, as far as I can tell, is great. Best of all, it scales. You can add servers to the cluster, you don’t do any client side hashing, throughput is increased as the size of the cluster increases. As far as I can tell, you can handle any increase in requests by adding servers as well as those servers being fault tolerant, so a dead server doesn’t bring down the cluster.
Voldemort does have a downside for me, because I primarily use ruby and the provided client is written in java, so you either have to use JRuby (which is awesome but not always realistic) or Facebook Thrift to interact with Voldemort. This means thrift has to be compiled on all of your machines, and since Thrift uses Boost C++ library, and Boost C++ library is both slow and painful to compile, deployment of Voldemort apps is increased significantly.
Voldemort is also intersting because it has pluggable data storage backend and the bulk of it is mostly for the sharding and fault tolerance and less about data storage. Voldemort might actually be a good layer on top of Redis or Tokyo Cabinet some day.
Voldemort, it should be noted, is also only going to be worth using if you actually need to spread your data out over a cluster of servers. If your data fits on a single server in Tokyo Tyrant, you are not going to gain anything by using Voldemort. Voldemort however, might be seen as a good migration path from Tokyo * when you do hit that wall were performance isn’t enough.(from: NoSQL If Only It Was That Easy)
Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.
CouchDB provides a RESTful JSON API than can be accessed from any environment that allows HTTP requests. There are myriad third-party client libraries that make this even easier from your programming language of choice. CouchDB’s built in Web administration console speaks directly to the database using HTTP requests issued from your browser.
It’s a “distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”. Data is stored in ‘documents’, which are essentially key-value maps themselves, using the data types you see in JSON. CouchDB can do full text indexing of your documents, and lets you express views over your data in Javascript. I could imagine using CouchDB to store lots of data on users: name, age, sex, address, IM name and lots of other fields, many of which could be null, and each site update adds or changes the available fields. In situations like that it quickly gets unwieldy adding and changing columns in a database, and updating versions of your application code to match. (from: Anti RDBMS a List of Distributed Key Value Stores)
good intro: InfoQ: CouchDB From 10K Feet
MongoDB is not a key/value store, it’s quite a bit more. It’s definitely not a RDBMS either. I haven’t used MongoDB in production, but I have used it a little building a test app and it is a very cool piece of kit. It seems to be very performant and either has, or will have soon, fault tolerance and auto-sharding (aka it will scale). I think Mongo might be the closest thing to a RDBMS replacement that I’ve seen so far. It won’t work for all data sets and access patterns, but it’s built for your typical CRUD stuff. Storing what is essentially a huge hash, and being able to select on any of those keys, is what most people use a relational database for. If your DB is 3NF and you don’t do any joins (you’re just selecting a bunch of tables and putting all the objects together, AKA what most people do in a web app), MongoDB would probably kick ass for you.
Oh, and did I mention that, of all the NoSQL options out there, MongoDB is the one of the only ones being developed as a business with commercial support available? If you’re dealing with lots of other people’s data, and have a business built on the data in your DB, this isn’t trivial.
On a side note, if you use Ruby, check out MongoMapper for very easy and nice to use ruby access.(from: NoSQL If Only It Was That Easy)
MongoDB: Dwight Merriman, 10gen (slides, video)
Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.
Cassandra was open sourced by Facebook in 2008, where it was designed by one of the authors of Amazon’s Dynamo. In a lot of ways you can think of Cassandra as Dynamo 2.0. Cassandra is in production use at Facebook but is still under heavy development.
Sounded very promising when the source was released by Facebook last year. They use it for inbox search. It’s Bigtable-esque, but uses a DHT so doesn’t need a central server (one of the Cassandra developers previously worked at Amazon on Dynamo). Unfortunately it’s languished in relative obscurity since release, because Facebook never really seemed interested in it as an open-source project. From what I can tell there isn’t much in the way of documentation or a community around the project at present. (from: Anti RDBMS a List of Distributed Key Value Stores)
good intro: Up and Running With Cassandra
Although it is not your native Key Value Store, I found it interesting to put on the list. While it is commonly used through an SQL interface, the architecture and performance is exactly what you want: Cloud-like sharding, very good performance on key-value lookups, etc… And if you don’t want the SQL, you can use the NDB API directly, or REST through mod_ndb Apache module (http://code.google.com/p/mod-ndb/).
This would score high on your list if you evaluated it:
– Transparent sharding: Data is distributed through an md5sum hash on your primary key (or user defined key), yet you connect to whichever MySQL server you want, the partitions/shards are transparent behind that.
– Transparent re-sharding: In version 7.0, you can add more data nodes in an online manner, and re-partition tables without blocking traffic.
– Replication: Yes. (MySQL replication).
– Durable: Yes, ACID. (When using a redundant setup, which you always will.)
– Commit’s to disk: Not on commit, but with a 200ms-2000ms delay. Durability comes from committing to more than 1 node, *on commit*.
– Less than 10ms response times: You bet! 1-2 ms for quite complex queries even.
Other KVS I have in my backlog to expand on here later:
And as a bonus – there is another quite interesting initiative started by Yehuda Katz:
Moneta: A unified interface for key/value stores
================================================
Moneta provides a standard interface for interacting with various kinds of key/value stores.
Out of the box, it supports:
* File store for xattr
* Basic File Store
* Memcache store
* In-memory store
* The xattrs in a file system
* DataMapper
* S3
* Berkeley DB
* Redis
* SDBM
* Tokyo
* CouchDB
All stores support key expiration, but only memcache supports it natively. All other stores
emulate expiration.
The Moneta API is purposely extremely similar to the Hash API. In order so support an
identical API across stores, it does not support iteration or partial matches, but that
might come in a future release. |
Moneta: A unified interface for key/value stores
================================================
Moneta provides a standard interface for interacting with various kinds of key/value stores.
Out of the box, it supports:
* File store for xattr
* Basic File Store
* Memcache store
* In-memory store
* The xattrs in a file system
* DataMapper
* S3
* Berkeley DB
* Redis
* SDBM
* Tokyo
* CouchDB
All stores support key expiration, but only memcache supports it natively. All other stores
emulate expiration.
The Moneta API is purposely extremely similar to the Hash API. In order so support an
identical API across stores, it does not support iteration or partial matches, but that
might come in a future release.
Any additional info / projects are welcome!