Project Voldemort – distributed key-value storage system

Voldemort is an open source distributed data store that is designed as a key-value store used for high-scalability storage.

Voldemort is a big, distributed, fault-tolerant, persistent hash table.

It is used at LinkedIn for certain high-scalability storage problems where simple functional partitioning is not sufficient.

Features include:

  • Data is automatically replicated over multiple servers.
  • Data is automatically partitioned so each server contains only a subset of the total data.
  • Server failure is handled transparently.
  • Pluggable serialization is supported to allow rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, Avro and Java Serialization.
  • Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system.
  • Each node is independent of other nodes with no central point of failure or coordination.
  • Good single node performance: you can expect 10-20k operations per second depending on the machines, the network, the disk system, and the data replication factor.
  • Support for pluggable data placement strategies to support things like distribution across data centers that are geographically far apart.
  • Combines in-memory caching with the storage system so that a separate caching tier is not required (instead the storage system itself is just fast).
  • Emulate the storage layer, as it is completely mockable. This makes the development and the unit testing easy, as it can be done against a throw-away in-memory storage system without the need for a real cluster or real storage system.
  • Unlike MySQL replication, both reads and writes scale horizontally.
  • Simple API: The API decides data replication and placement and accommodates a wide range of application-specific strategies.
  • Transparent data partitioning: This allows for cluster expansion without rebalancing all data.
  • Supports hashtable semantics.
  • Embeddable.
  • Integrity: Atomicity, Consistency, Durability, Revision Control, Optimistic Locking model.
  • Distribution: Horizontal scalable, replication, symmetric replication, and sharing.
  • Compression.
  • TTL for entries.
  • Unicode support.

Support: GitHub
Developer: LinkedIn
License: Apache License Version 2.0

