C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++ (Developer Reference) by Ade Miller and Kate Gregory

September 25, 2012
Publisher: Microsoft Press
ISBN: 978-0735664739

From the Amazon website:

Capitalize on the faster GPU processors in today's computers with the C++ AMP code library--and bring massive parallelism to your project. With this practical book, experienced C++ developers will learn parallel programming fundamentals with C++ AMP through detailed examples, code snippets, and case studies. Learn the advantages of parallelism and get best practices for harnessing this technology in your applications.


Discover how to:

  • Gain greater code performance using graphics processing units (GPUs)
  • Choose accelerators that enable you to write code for GPUs
  • Apply thread tiles, tile barriers, and tile static memory
  • Debug C++ AMP code with Microsoft Visual Studio
  • Use profiling tools to track the performance of your code

Get code samples on the web at ampbook.codeplex.com


  • The C++ AMP programming model comprises a modern C++ STL-like template library and two extensions to the C++ language that are integrated into the Visual C++ 2012 compiler.
  • The book samples use Parallel Patterns Library and the Asynchronous Agents Library. 
  • Code samples include: case studies, an n-body gravitational model, implementations of the reduce algorithm, an application that cartoonizes images.
  • A heterogeneous supercomputer is a machine with a mix of CPU and GPU cores, whether on the same chip or not, or a cluster of machines offering such a mix.
  • The GPU's speed improvements are available only on tasks for which the GPU is designed, not a general-purpose tasks. The GPU works best on problems that are data-parallel.
  • GPUs have a massive number of threads performing sequential accesses. These threads are not all independent; they are arranged in groups. The groups are called warps on NVIDIA and wavefronts on AMD hardware.
  • Parallel processing in used in the following fields: scientific modeling and simulation, real-time control systems, financial simulation and prediction, gaming, image processing.
  • Amdahl's Law: the contribution of the sequential part is the final determiner of the possible speedup. It means that choosing algorithms to minimize the nonparallelizable part of the time spent is very important for maximum improvement.
  • Choosing a data-parallel algorithm utilizing GPGPU might result in more overall benefit than choosing a very fast and efficient algorithm that is highly sequential and cannot be palatalized.
  • Vectorization through SIMD is a technology for CPU parallelism. It allows for applying the same instruction to each element of the vector.
  • Another technology for CPU parallelism is OpenMP - a cross-language and cross-platform API. Visual C++ supports OpenMP with a set of compiler directives.