Wednesday, November 2, 2011

Database Scalability, Elasticity, and Autonomy in the Cloud


This paper investigates a number of issues associated with scaling databases and key/value stores on cloud computing infrastructure, along with an overview of current research addressing these problems. Scalability is the major problem, since RDBMSs have not traditionally been designed to scale out horizontally very well past a few machines. On the other hand, key-value stores scale very well, but providing multiple-key transactions on them is difficult. They call this problem of multi-key atomicity “data fusion.” The paper’s authors have designed G-store to provide transactional multi-key access guarantees over dynamic, non-overlapping groups of keys using an ownership leader/follower key abstraction. In contrast, “data fission” is the problem of sharding a database into relatively independent partitions. Migration of data between partitions is another challenge, both for shared storage and shared-nothing DB architectures. For shared storage, the authors have designed an “Iterative Copy” technique that transfers the main memory state of the partition to avoid warm-up time at the destination. For the shared-nothing architecture, the persistent image of the database must also be migrated, which is usually much larger than just the memory state copied in “iterative copy”. To accomplish this task, Zephyr introducs a synchronized phase that allows both the source and destination to execute transactions for the tenant while the data is migrated, using a combination of on-demand pull and asynchronous push of data, minimizing the window of unavailability.

The problems addressed by this paper are real problems. Scaling databases is difficult – and there don’t seem to be any great solutions to repartitioning and live migration in sight, though the techniques mentioned can minimize the problems. Transactions on multiple keys in a key-value store are also important. That said, this paper read like a laundry list of problems and solutions and I found the overall structure somewhat incoherent. The paper itself doesn’t seem to introduce anything new and is just an overview of current research, so I somewhat doubt that the paper itself will be influential.

No comments:

Post a Comment