Wednesday, August 31, 2011

“The Datacenter as a Computer” (Chapters 1 & 2)


The first chapter introduces the idea of a warehouse-scale computer, a computing system that is set apart by the massive scale of its data, software infrastructure, and hardware platform. These are distinct from datacenters, which often host multiple unrelated applications on their servers, while WSCs usually belong entirely to one company and might be running the same (or at least a smaller number) software and applications throughout.

WSCs are often composed of blocks of racks of cheap servers with storage provided by local disk drives managed by a distributed file system, or via a NAS (at the block/cluster-level). Given the commodity hardware, WSCs require that software be able to tolerate a high failure rate.

Software for applications run on WSCs can be divided into layers: platform-level e.g., kernel, os), cluster-level (e.g., distributed fs, MapReduce, etc.), and application-level (e.g. front-facing web services like Gmail). Compared to desktop software, Internet services have more parallelism (at all levels), workload churn (easy to deploy new versions), platform homogeneity, and higher failure rates. This leads to common use of replication, sharding, load-balancing, health-checking, integrity checks, application-specific compression, and eventual consistency.

Is the problem real?

The challenges posed by WSCs are very significant: their sheer size presents difficulties in testing, deployment, and management of applications; energy issues; and the challenge of effectively programming applications to take advantage of huge numbers of machines while dealing with the obstacles they present (e.g., less homogeneous storage hierarchy, fault rates, performance variability).

What is the solution's main idea (nugget)?

The main idea here is that large-scale Internet services (and other demanding applications) require huge amounts of resources best served by dedicated datacenters. Since the servers in these datacenters are all running on just a few (often related) applications, they can be called Warehouse-Scale Computers (WSCs), and present unique challenges and advantages.

Why is solution different from previous work?

WSCs are becoming popular at large companies due to the need to process huge amounts of data – a characteristic that grows as the number of users of these services grows and as the Internet grows in general. They are also enabled by decreasing hardware costs for inexpensive commodity servers, putting large-scale WSCs in the reach of companies that would not have been able to afford these systems before.

Does the paper (or do you) identify any fundamental/hard trade-offs?

Yes, one of the largest trade-offs is the need for sophisticated and scalable monitoring services, as the rate of failure is very significant when dealing with large numbers of servers/clusters. Likewise, the applications themselves need to be tolerant of faults, which necessitates more complexity in the applications themselves.

Do you think the work will be influential in 10 years?

The relatively low cost of commodity hardware already makes WSCs available to many of the large Internet services, while the decreasing pricing trends suggest that soon many organizations will be able to afford arrays of servers, building systems that will share many of the defining features of WSCs, including their scale, architecture, and fault behavior. This makes understanding WSCs relevant to the evolving face of computing. With the amount of data only increasing, it seems very fair to expect that WSCs will play an increasingly important role in the computing infrastructure for many companies over the next ten years. Thus, the general idea of a WSC (irrespective of variation in the implementation) should be even more relevant in 10 years than it is today.

No comments:

Post a Comment