Wednesday, August 31, 2011

Above the Clouds: A Berkeley View of Cloud Computing


Cloud Computing denotes applications that are designed as services over the Internet as well as the datacenter infrastructure that supports such services. Three core aspects of cloud computing (on the hardware side) include: the illusion of infinite on-demand computing resources, pay as you go usage, and lack of up-front costs. Cloud Computing can be used within a company (e.g. Google) as a "private cloud," or as a "public cloud" service for external users/businesses (e.g. AWS).

Is the problem real?

Absolutely. Cloud Computing is growing in popularity at a rapid pace and is allowing large-scale computation as a pay-as-you go service to a wide range of users. At the same time, the term itself is somewhat imprecise and this paper tackles the very relevant problem of differentiating Cloud Computing from conventional computing.

What is the solution's main idea (nugget)?

The main idea of this paper is that Cloud Computing has been made possible by using large-scale datacenters stocked with commodity servers at locations with low overhead costs (e.g. energy, bandwidth), due to the savings from economies of scale. It also argues that it is essential to identify the obstacles to adoption of Cloud Computing while identifying the types of applications most likely to benefit from it.

Why is solution different from previous work?

The idea of computing as a utility is not new, but the plummeting costs of commodity hardware certainly makes the realization of Cloud Computing platforms within reach of many companies (and makes their services available to many end users/organizations), something that was not true before. The Internet has created an increased demand for such services--especially elastic computing--since workloads can vary greatly on little notice.

Another reason that Cloud Computing has become popular now might stem from the growing popularity of services on the Internet in general (e.g. Paypal instead of contractual payment processing companies, etc.) -- and a move towards pay-as-you-go, utility-like services.

The types of applications enabled by Cloud Computing are not necessarily new, but there are a number of other application types that benefit or might benefit from CC (that didn't before) including mobile interactive applications, parallel batch processing, analytics, and compute-intensive desktop applications.

Does the paper (or do you) identify any fundamental/hard trade-offs?

The paper identifies ten obstacles to the continued growth of Cloud Computing:
-      availability of service (this is particularly relevant given the high-profile AWS outages this year)
-      -  data lock-in (proprietary code/APIs, etc.)
-       data confidentiality & auditability
-       data transfer bottlenecks (for example, this makes it particularly difficult for real-time stock apps, etc. to run on cloud)
-       performance unpredictability
-       scalable storage
-       bugs in large-scale distributed systems
-       scaling quickly
-       reputation fate sharing
-       software licensing

      However, for each of these obstacles, the article presents possible solutions that would allow growth.

At the same time, moving applications to the cloud is not necessarily the right choice for every application. As the article points out, cloud services (like Amazon EC2) can be more expensive than the cost of an equivalent physical server, so if an application has a fairly consistent workload (high utilization of all of its hardware), it might not be cost-effective to move to the cloud.

However, many applications have varying utilization with peaks much higher than their average workload, so for these, it makes sense to only pay for computing services as needed (or else they often end up with the cost of overprovisioning their hardware—or underprovisioning and losing users). Thus, it is important to gauge the benefits of the elasticity and transference of risk provided by Cloud Computing against cost trade-offs (or any of the above obstacles) before determining whether CC provides an advantageous solution.

Do you think the work will be influential in 10 years?

It seems quite certain that Cloud Computing will be important in 10 years, as the amounts of data and numbers of users of applications continue to grow. Likewise, falling hardware prices should put even more CC resources in the hands of users/organizations. This paper does a good job of defining Cloud Computing and identifying its strengths, weaknesses, and potential areas for improvement. It is very likely that this will influence the development of Cloud Computing (if it hasn’t already) and remain relevant over the next decade.

“The Datacenter as a Computer” (Chapters 1 & 2)


The first chapter introduces the idea of a warehouse-scale computer, a computing system that is set apart by the massive scale of its data, software infrastructure, and hardware platform. These are distinct from datacenters, which often host multiple unrelated applications on their servers, while WSCs usually belong entirely to one company and might be running the same (or at least a smaller number) software and applications throughout.

WSCs are often composed of blocks of racks of cheap servers with storage provided by local disk drives managed by a distributed file system, or via a NAS (at the block/cluster-level). Given the commodity hardware, WSCs require that software be able to tolerate a high failure rate.

Software for applications run on WSCs can be divided into layers: platform-level e.g., kernel, os), cluster-level (e.g., distributed fs, MapReduce, etc.), and application-level (e.g. front-facing web services like Gmail). Compared to desktop software, Internet services have more parallelism (at all levels), workload churn (easy to deploy new versions), platform homogeneity, and higher failure rates. This leads to common use of replication, sharding, load-balancing, health-checking, integrity checks, application-specific compression, and eventual consistency.

Is the problem real?

The challenges posed by WSCs are very significant: their sheer size presents difficulties in testing, deployment, and management of applications; energy issues; and the challenge of effectively programming applications to take advantage of huge numbers of machines while dealing with the obstacles they present (e.g., less homogeneous storage hierarchy, fault rates, performance variability).

What is the solution's main idea (nugget)?

The main idea here is that large-scale Internet services (and other demanding applications) require huge amounts of resources best served by dedicated datacenters. Since the servers in these datacenters are all running on just a few (often related) applications, they can be called Warehouse-Scale Computers (WSCs), and present unique challenges and advantages.

Why is solution different from previous work?

WSCs are becoming popular at large companies due to the need to process huge amounts of data – a characteristic that grows as the number of users of these services grows and as the Internet grows in general. They are also enabled by decreasing hardware costs for inexpensive commodity servers, putting large-scale WSCs in the reach of companies that would not have been able to afford these systems before.

Does the paper (or do you) identify any fundamental/hard trade-offs?

Yes, one of the largest trade-offs is the need for sophisticated and scalable monitoring services, as the rate of failure is very significant when dealing with large numbers of servers/clusters. Likewise, the applications themselves need to be tolerant of faults, which necessitates more complexity in the applications themselves.

Do you think the work will be influential in 10 years?

The relatively low cost of commodity hardware already makes WSCs available to many of the large Internet services, while the decreasing pricing trends suggest that soon many organizations will be able to afford arrays of servers, building systems that will share many of the defining features of WSCs, including their scale, architecture, and fault behavior. This makes understanding WSCs relevant to the evolving face of computing. With the amount of data only increasing, it seems very fair to expect that WSCs will play an increasingly important role in the computing infrastructure for many companies over the next ten years. Thus, the general idea of a WSC (irrespective of variation in the implementation) should be even more relevant in 10 years than it is today.