Wednesday, November 2, 2011

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center


Mesos is a thin layer that provides fine-grained resource sharing across diverse cluster computing frameworks. It lets, for example, Hadoop and MPI, or Hadoop and Spark run on the same cluster without having to partition the cluster or allocate a set of VMs to each framework. Instead, Mesos is based on a scheduler that delegates scheduling control to the frameworks themselves using a “resource offer” abstraction. Mesos controls how many resources to offer each framework, and each framework determines which resources (that have been offered) to accept and which tasks to run on those resources.

The main nugget here was that pushing control of resources (or at least partial control) to the frameworks sharing the cluster provides an efficient way sharing resources (since they take only what they need – and are offered only what is available), while keeping the scheduler simple. The paper identifies several trade-offs in their approach—and these are mainly tied to the choice of decentralized vs. centralized scheduling. This model is vulnerable to fragmentation due to heterogeneous resource demands, doesn’t really work if there are inter-framework dependencies, and add an extra level of framework complexity, since the frameworks need to interact with the scheduler explicitly. That said, the extra overhead of this interaction is not much different from interacting with a central scheduler.

Mesos differs from previous work in a number of ways. HPC and Grid schedulers are more targeted to monolithic jobs on specialized hardware, so they are typically centralized (unlike Mesos). In the cloud, EC2 and Eucalyptus use a VM allocation model, which is more coarse-grained than Mesos, resulting in less efficient resource utilization and doesn’t allow the level of task placement that Mesos does.

It seems that Mesos could certainly be influential over the next decade, given the growth of private clouds and the fact that these clouds aren’t always utilized to their full extent—allowing room to benefit from running multiple frameworks on the same cluster. That said, many public clouds (e.g. EC2 instances) are fired up for short(er) periods of time to run a framework (e.g. Hadoop) with relatively high utilization and I wonder if Mesos is really necessary for many of these use cases.

No comments:

Post a Comment