CS 294 Blog: "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks"

Dryad addresses the problem of executing distributed data-parallel applications by using a directed acyclic graph (DAG), modeling computation in vertices and communication/data transfer in the edges. The choice of this model is motivated by the desire to provide a simple(r) programming model for distributed applications, while allowing reliability, efficiency, and scalability from multi-core computers to clusters. The graph model allows developers much more control over data flow, something that is much more limited in other parallel programming environments (e.g. MapReduce or GPU shader languages).

The main tradeoff that I see with Dryad is that it seems to be considerably more complex than MapReduce, but the DAG approach also offers more flexibility. Programmers must first plan out their various subroutines and generate a data flow plan that corresponds to their program. This is considerably more complex than what MapReduce requires programmers to do. At the same time, however, this design makes Dryad more general-purposes than MapReduce, allowing a variety of services to be built on top of the system (like DryadLINQ, which can then provide more "friendly" development interfaces). Still, the main takeaway I had from the paper was that there is a large tradeoff between complexity and flexibility inherent in this and other parallel execution engines.

It remains to be seen whether or not Dryad will remain influential over the next decade. It seems that Microsoft has followed through with a number of services built on top of Dryad (DryadLINQ, Scope, etc.). However, the fact that this is all enterprise-level / closed-source (?) makes it hard to envision them becoming popular with startups and the open source community. Performance-wise, the paper suggests that it scales well, which looks promising. The flexibility vs. complexity tradeoff seems significant, but I think we will see systems like Dryad being used for those applications that cannot be expressed very well in MapReduce terms. In reality, it seems likely that Hadoop might be extended to encompass some of the flexibility that Dryad provides.

CS 294 Blog

Wednesday, September 28, 2011

"Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks"

No comments:

Post a Comment