CS 294 Blog: "Cluster-Based Scalable Network Services"

This paper addresses the issue of scalability in network services, highlighting three fundamental advantages of using clusters: scalability, availability, and cost-effectiveness. Scalability means an incremental and linear increase in hardware can maintain constant service performance as load increases. Availability means that the service must be available 24x7, regardless of failures (which are masked by software). Cost-effectiveness means that the service must be economical to scale (emphasis on commodity hardware). However, clusters also present challenges for administration, component vs. system replication (matching hardware to software scaling), partial failures, and lack of shared state among cluster nodes.

The paper proposes a layered architecture based on separation of content from services, using a service-programming model based on workers that perform Transformation, Aggregation, Caching, and Customization (TACC) of content on top of a Scalable Network Service (SNS) layer. It then identifies several techniques: BASE (Basically Available, Soft State, Eventual Consistency), allowing partial failure in clusters, but with less complexity and cost than ACID. Many web services don't need the strong consistency or durability of data provided by ACID, and instead place more value on high availability. By not requiring durability and consistency across partial failures, BASE effectively reduces communication and disk activity (or postponement), increasing availability.

The SNS layer provides incremental/absolute scalability of worker nodes (which are simple and stateless), worker load balancing by the manager, front-end availability/fault tolerance, and system monitoring. On top of this, the TACC layer provides an API for transformation of data objects, aggregation (combining data from various objects), customization, and caching.

The paper then looks at the implementation of two network services that use BASE (or BASE-like techniques). TranSend, a scalable transformation and caching proxy for Berkeley dialup users, and HotBot, the Inktomi search engine. In TranSend, web content and other application data exploited BASE semantics: load balancing data was stale (timeouts help recover from extreme cases), transformed data was cached (i.e. soft state), and approximate objects were used by distillers (which transform/aggregate content). However, some data—user profiles—did require the guarantees provided by ACID-compliant databases.

Some of the ideas in the paper are specific to the problems the authors were most concerned with at the time and which have become less relevant (e.g. caching and distillation of web content, etc.). Nevertheless, many of the general ideas presented in this paper have been very influential (or predictive of future trends). The layered architecture is very similar to (if more narrow than) the WSCs described in the Google paper. Likewise, the use of commodity servers is a trend that has only increased in popularity (and servers are becoming more and more lower-end). The BASE techniques and a general shift way from ACID (where immediate consistency, etc. is not absolutely necessary) have also proven influential--and quite logically led to Brewer's CAP theorem--leaving a significant impact on cloud computing.

CS 294 Blog

Monday, September 19, 2011

"Cluster-Based Scalable Network Services"

No comments:

Post a Comment