Piccolo is a data-centric
programming model for writing parallel in-memory applications across many
machines. Computations are organized around a series of application kernel
functions, where each kernel is launched concurrently on several nodes. These
instances share a distributed, mutable state using a set of in-memory key-value
tables with entries that reside in the memory of the various nodes. The Piccolo
runtime uses a message system to read and modify table entries on the nodes.
This approach makes Piccolo appropriate for algorithms that benefit from access
to shared intermediate state, including machine learning and graph algorithms.
This is in stark contrast to MapReduce, whose data-flow programming model
doesn’t expose any shared state or provide access to intermediate state, and
this must be simulated (inefficiently) by joining multiple data streams.
Concurrent updates to the
same key can be handled using a programmer-defined accumulation function for
each table, which Piccolo will execute on runtime to combine the updates to the
individual key (alternatively, one of Piccolo’s built-in accumulation functions
can be used instead). Data locality can be specified via two different types of
policies: co-locate kernel execution with a certain table partition or
co-locate partitions of different tables. Machine failures are handled via a
global checkpoint/restore mechanism. Piccolo saves a consistent snapshot of all
shared table state (using the Chandy-Lamport algorithm), but users have to save
information to recover the position of their kernel execution (due to the
complexity of checkpointing executables).
There are trade-offs inherent
in various parts of Piccolo. Fault-tolerance is achieved through checkpointing,
which adds the extra overhead of logging all changes after a snapshot is taken.
Likewise, when confronted with the trade-off of providing full fault-tolerance
by snapshotting kernel execution state vs. complexity, the paper opted to let
users implement the functionality.
No comments:
Post a Comment