Skip to content

Parallel Computing

Here we introduce three basic parallel models: Shared Memory, Distributed Memory and Hybrid. Images are taken from the Lawrence Livermore National Lab's Parallel Computing Tutorial. Visit their site to learn more.

Shared Memory with Threads

  • A main program loads and acquires all of the necessary resources to run the "heavy weight" process.
  • It performs some serial work, and then creates a number of threads ("light weight") run by CPU cores concurrently.
  • Each thread can have local data, but also, shares the entire resources, including memory of the main program.
  • Threads communicate with each other through global memory (RAM, Random Access Memory). This requires synchronization operations to ensure that no than one thread is updating the same RAM address at any time.
  • Threads can come and go, but the main program remains present to provide the necessary shared resources until the application has completed.

Examples: POSIX Threads, OpenMP, CUDA threads for GPUs

Distributed Memory with Tasks

  • A main program creates a set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines.

  • Tasks exchange data through communications by sending and receiving messages through fast network (e.g. infinite band).

  • Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.

  • Synchronization operations are also required to prevent a race condition. Example: Message Passing Interface (MPI)

Hybrid Parallel

  • A hybrid model combines more than one of the previously described programming models.
  • A simple example is the combination of the message passing model (MPI) with the threads model (OpenMP).

    • Threads perform computationally intensive kernels using local, on-node data
    • Communications between processes on different nodes occurs over the network using MPI
  • Works well to the most popular hardware environment of clustered multi/many-core machines.

  • Other example: MPI with CPU-GPU (Graphics Processing Unit)

Hybrid OpenMP-MPI Parallel Model:

The image shows a Hybrid OpenMP-MPI Parallel Model with CPU units, shared "OpenMP Memory" segments, and "MP" labels indicating MPI communication, illustrating combined shared and distributed memory parallelism.

Hybrid CUDA-MPI Parallel Model:

A diagram illustrating a hybrid CUDA-MPI parallel computing model. It depicts two computing nodes connected via a network using MPI (Message Passing Interface). Each node consists of multiple CPUs and shared memory, with a GPU attached via CUDA. The CPUs within each node handle computation and memory management, while the GPUs accelerate processing. The MPI connections between nodes facilitate distributed computing across multiple machines.