Mesos is a thin layer that
provides fine-grained resource sharing across diverse cluster computing
frameworks. It lets, for example, Hadoop and MPI, or Hadoop and Spark run on
the same cluster without having to partition the cluster or allocate a set of
VMs to each framework. Instead, Mesos is based on a scheduler that delegates
scheduling control to the frameworks themselves using a “resource offer”
abstraction. Mesos controls how many resources to offer each framework, and
each framework determines which resources (that have been offered) to accept
and which tasks to run on those resources.
The main nugget here was that
pushing control of resources (or at least partial control) to the frameworks
sharing the cluster provides an efficient way sharing resources (since they
take only what they need – and are offered only what is available), while
keeping the scheduler simple. The paper identifies several trade-offs in their
approach—and these are mainly tied to the choice of decentralized vs.
centralized scheduling. This model is vulnerable to fragmentation due to
heterogeneous resource demands, doesn’t really work if there are
inter-framework dependencies, and add an extra level of framework complexity,
since the frameworks need to interact with the scheduler explicitly. That said,
the extra overhead of this interaction is not much different from interacting
with a central scheduler.
Mesos differs from previous
work in a number of ways. HPC and Grid schedulers are more targeted to
monolithic jobs on specialized hardware, so they are typically centralized
(unlike Mesos). In the cloud, EC2 and Eucalyptus use a VM allocation model,
which is more coarse-grained than Mesos, resulting in less efficient resource
utilization and doesn’t allow the level of task placement that Mesos does.
It seems that Mesos could
certainly be influential over the next decade, given the growth of private
clouds and the fact that these clouds aren’t always utilized to their full
extent—allowing room to benefit from running multiple frameworks on the same
cluster. That said, many public clouds (e.g. EC2 instances) are fired up for
short(er) periods of time to run a framework (e.g. Hadoop) with relatively high
utilization and I wonder if Mesos is really necessary for many of these use
cases.
No comments:
Post a Comment