Parallel Computing
Here we introduce three basic parallel models: Shared Memory, Distributed Memory and Hybrid. Images are taken from the Lawrence Livermore National Lab's Parallel Computing Tutorial. Visit their site to learn more.
Shared Memory with Threads
- A main program loads and acquires all of the necessary resources to run the "heavy weight" process.
- It performs some serial work, and then creates a number of threads ("light weight") run by CPU cores concurrently.
- Each thread can have local data, but also, shares the entire resources, including memory of the main program.
- Threads communicate with each other through global memory (RAM, Random Access Memory). This requires synchronization operations to ensure that no than one thread is updating the same RAM address at any time.
- Threads can come and go, but the main program remains present to provide the necessary shared resources until the application has completed.
Examples: POSIX Threads, OpenMP, CUDA threads for GPUs
Distributed Memory with Tasks
-
A main program creates a set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines.
-
Tasks exchange data through communications by sending and receiving messages through fast network (e.g. infinite band).
-
Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.
-
Synchronization operations are also required to prevent a race condition. Example: Message Passing Interface (MPI)
Hybrid Parallel
- A hybrid model combines more than one of the previously described programming models.
-
A simple example is the combination of the message passing model (MPI) with the threads model (OpenMP).
- Threads perform computationally intensive kernels using local, on-node data
- Communications between processes on different nodes occurs over the network using MPI
-
Works well to the most popular hardware environment of clustered multi/many-core machines.
-
Other example: MPI with CPU-GPU (Graphics Processing Unit)
Hybrid OpenMP-MPI Parallel Model:
Hybrid CUDA-MPI Parallel Model: