Just about every webapp out there serves up data stored on a relational database. While RDBMSes are not particularly easy or pleasant to deal with, most programmers know how to use them, and popular frameworks provide all sorts of wrappers which aim to solve the impedance mismatch of SQL while preserving much of its querying flexibility. Aside from the need to map application data to relations, the use of relational databases has a number of other drawbacks. RDBMSes do not scale up easily, they have significant administrative overhead, and they are difficult to tune for performance.
In plain terms, configuring and properly setting up a replicated database server takes a lot of work. It requires maintaining disparate configuration files for different machines, and it requires an intimate understanding of various tuning parameters. It helps to have a DBA on staff to deal with these headaches. Once the infrastructure works, all database access requires great care. After tuning indices and queries for one usage pattern, a subtle change in requirements often leads to modified queries and a new usage pattern which slows the database to a crawl and requires an entirely new, time-consuming pass through the schema, the indices, and all queries to make things fast again. I spent enough time staring at Sybase’s showplan output after adding a feature or fixing a bug to have come to hate the process of optimizing queries. Even once everything starts to tick, it sometimes turns out that the data size and usage load has outgrown the capacity of a single server, and the schema must be denormalized, sharded across multiple servers, and once again optimized. In addition, applications must be modified to stop assuming that they can get away with querying just one server.
memcached is an example of a data storage mechanism which has no such overhead. To run, it just needs to know how much memory to use, and which network interface and port to listen on, and it just works (memcached -d -m 2048 -l 10.0.0.1 -p 6789, for example). Of course, memcached has no replication or redundancy features, nor does it do anything except simple key-value storage and lookup, so the comparison is unfair. However, quite a few application domains do not need all the querying power of a relational database, and want none of the performance or maintenance overhead which comes with it.
Event queues come to mind. Twitter seems to feel the same way: it recently released its event queue, Starling, as an open-source product. Although Starling persists data to disk, it does not itself have any redundancy or replication features, so the loss of a disk would result in the loss of all queues stored on that disk. High-availability tools, like Heartbeat and DRBD, can help mitigate the risk, but they again introduce administrative overhead.
The ideal data store would boast disk-based and memory-cached persistence, ACID transactions, a replication mechanism, the ability to store and retrieve data in slightly more complicated patterns than just a key-value lookup, and zero administrative overhead. It turns out that something very close to this ideal has existed for many years: Berkeley DB. Its core C library provides a flexible data storage library and a handful of primitive operations for operating on data stores. Its Java Edition provides a higher-level annotation based persistence mechanism whose usage resembles Hibernate and EJB3. (Bindings for languages other than the Oracle-supported C, C++, and Java unfortunately lag in terms of features, and none except C and C++ support the replication framework.)
I decided to try Berkeley DB, and immediately discovered two things: on one hand, throwing away SQL and its data definition language is scary, because reconfiguring schemas now requires actual code. On the other hand, throwing away SQL for queries felt like throwing away a crutch. Because Berkeley DB operates on a low level, I never had to wonder why a particular access operation runs slowly. If something performs a dreaded table scan, it does so not because SQL obscured an expensive operation — it does a table scan because I wrote the loop myself. I quickly learned about secondary databases, which work like indices on primary data, but provide more flexibility because they allow programmers to slice and dice both keys and values in arbitrary ways to produce new keys into the same data. The experience was eye-opening.
Oracle, which acquired Berkeley DB by buying Sleepycat Software, boasts that Berkeley DB has “an interface designed for programmers, not DBAs” (half-way down the Java Edition page). This seems true enough, as deploying a Berkeley DB application is trivial: just install the libraries, and make sure your code knows where to put the data stores. Stores cannot be placed on network storage, because typical network file systems (NFS, SMB) lack sufficiently rigorous semantics. This makes Berkeley DB an interesting tool for building applications which scale horizontally, where individual disposable nodes in the system replicate data out to other nodes. Google, famous for building its infrastructure on inexpensive hardware and achieving scale using cluster computing, chose Berkeley DB for storing accounts.
I must also mention two other fascinating storage systems. First, Franz sells a product called AllegroCache, a persistence mechanism for the Common Lisp Object System. For anyone enjoying the luxury of coding in Common Lisp, AllegroCache only seems to lack replication features. Second, the Mnesia database which comes with the Erlang/OTP distribution has two unique features: dynamic table fragmentation and dynamic replica management. Roughly speaking, Mnesia lets the programmer control, at runtime, the number of data fragments and how many replicas it should keep on the available nodes.
Leave a reply
You must be logged in to post a comment.