Monday, September 12, 2011

“Data-Level Parallelism in Vector, SIMD, and GPU Architectures”

This chapter from Hennessy & Patterson offers a detailed overview of single instruction multiple data (SIMD) architectures, comparing two main types: graphics processing units (GPUs) and vector processors (e.g. Cray).

GPUs are very widespread, which greatly helps push their adoption. However, CUDA/OpenCL have significant learning curves. Not only is the jargon an issue, as the book points out (with helpful--and somewhat humorous--translation tables), but learning to program them effectively is also non-trivial. They have many different layers of memory with different strengths/weaknesses (latency, scope of shared memory, etc.) and choosing the optimal combination can be difficult for the programmer. Transferring data from main CPU memory to internal GPU memory also is very costly and hinders overall I/O speed. The cores themselves are also not very powerful, although speed can be improved using faster (but less precise) math libraries, etc. At the same time, the vast number of cores available on GPUs provide an excellent opportunity for data parallel applications to run faster.

Vector processors, on the other hand, are somewhat more intuitive to program and effective in data parallelism.  However, this extra programmer-friendliness requires more architectural complexity, making them more expensive. GPUs are already so widespread (due to their graphics applications) that it seemsmost likely that they will be the dominant  trend going forward. As far as their application to cloud computing, it seems likely that some combination of CPUs and GPUs will certainly make their way into the servers in WSCs. The large amounts of cores make them great for MapReduce-style and other data parallel applications. Nevertheless, in addition to complexity of programming, virtualization and scheduling issues remain obstacles to widespread adoption of GPUs in cloud computing.

No comments:

Post a Comment