Hadoop NextGen, the next
major revision to the Hadoop MapReduce framework, addresses several issues with
Hadoop’s scalability and design. Above all, the current version of Hadoop doesn’t
scale past 4,000 nodes and the current architectural design makes it difficult
to fix this and other problems without a fundamental overhaul of the code base.
Hadoop NextGen divides the
responsibilities currently handled by the job tracker into two separate
components. The Resource Manager globally assigns compute resources to applications,
while the per-application Application Master manages scheduling and
coordination (an application is either a single MapReduce job, or a DAG of MR
jobs).
Reading this blog post after
Mesos, it seems like Nextgen takes an approach that is quite similar to Mesos,
though at a different level of abstraction. The ResourceManager acts much like
Mesos’ scheduler, coordinating the global assignment of compute resources to
applications (though without Mesos’ resource offer model). At the same time,
Nextgen is at a lower level of abstraction – in principle, it could itself run
on Mesos, sharing resources with, say, Spark, MPI, and other frameworks. That
said, while Mesos could enable multiple Hadoop instances to run over the same
cluster, this would not fix the scalability issues inherent in the current
version, since Hadoop would still not be able to scale over 4000 nodes for a
given job. Thus, Nextgen still fills a need that Mesos alone (or another
resource scheduler) could not necessarily satisfy.
As far as tradeoffs go,
certain parts of the Nextgen framework are still centralized (e.g. the resource
manager), so this could become a bottleneck for performance in the future.
Despite this, Nextgen appears very promising and makes many significant performance
and scalability improvements on the ever-popular Hadoop framework, which should
guarantee continuing popularity for Hadoop (that is, if everyone doesn’t switch
to Spark first!).