GPU Tools
NVIDIA's CUDA toolkit comes with a variety of software that you can use to monitor the status of GPUs and analyze the performance of GPU code. These software resources can greatly enhance the efficacy of GPU usage.
Find Available GPUs
NVIDIA's System Management Interface (or nvidia-smi
) is useful
for seeing information about GPU utilization. This is particularly
useful when using GPU development nodes
as the GPU are shared between all users on the node.
The nvidia-smi
utility is available without loading a CUDA module.
Running nvidia-smi
with no arguments will show a table of information
about all the GPUs on that node and a table of all running processes.
The GPUs are indexed with an integer. Once you identify one with low
utilization, set the CUDA_VISIBLE_DEVICES
environment variable to control
which GPU(s) your application will use. For example,
1 |
|
will make your application use the second GPU on the node as GPU indices start from zero. You can also tell your application to use multiple GPUs; for example,
1 |
|
Note
If you don't set CUDA_VISIBLE_DEVICES
, your program will default to using
the GPUs in order. If your program only uses one GPU, this will be device 0.
See this documentation
for additional nvidia-smi
options.
Multi-GPU Communication
If your software uses multiple GPUs, it ideally makes use of peer-to-peer communication. Peer-to-peer GPU communication eliminates the CPU as a data transfer "middleman." When data must first be sent to the CPU before passing on to the destination GPU, this adds to overall time the transfer takes and may also delay the CPU from communicating additional data and instructions to the GPUs.
Nodes that support NVLink support peer-to-peer communication.
Other multi-GPU nodes may still support peer-to-peer communication
depending on their network topology; this can be checked with nvidia-smi topo -m
.
This command will produce a matrix showing the various connections between the GPUs.
Software will make use of peer-to-peer communication if it includes directives like
cudaDeviceEnablePeerAccess
or cudaMemcpyDeviceToDevice
.
See NVIDIA's simpleP2P
or mergeSort
CUDA samples for examples.
Debugging Tools
There are three suggested options for debugging CUDA code on the HPCC:
* If you're already familiar with GDB, CUDA-GDB is available through
the CUDA
modules. Documentation is available from NVIDIA.
* If you use VS Code for writing software, NVIDIA has the Nsight Visual Studio Code
extension.
* ICER pays for the TotalView debugger.
This debugger is best launched from the command line through an Interactive Desktop OnDemand app.
There is documentation for both its
modern and
classic interfaces.
Some advanced tools may only be available in the classic interface.
Profiling Tools
NVIDIA's Nsight Systems
and Nsight Compute
are profiling and performance analysis tools that can be used to
identify performance bottlenecks and other software optimizations.
They are the modern replacements for nvprof
and the NVIDIA Visual Profiler.
Each version of the CUDA toolkit ships with different versions of Nsight Systems and Compute as laid out in the table below. You should use a CUDA version that is compatible with your desired GPU.
CUDA Toolkit Version | Nsight Systems Version | Nsight Compute Version |
---|---|---|
12.6.0 | 2024.4.2 | 2024.3.0 |
12.4.0 | 2023.4.4 | 2024.1.0 |
12.3.0 | 2023.3.3 | 2023.3.0 |
12.1.1 | 2023.1.2 | 2023.1.1 |
11.7.0 | 2022.1.3 | 2022.2.0 |
Nsight Compute can be used to profile CUDA kernels (functions that run on GPUs). On the other hand, Nsight Systems analyzes everything involved in running your code: CPU parallelization, CPU-GPU communication, network communications, OS interactions, and more. If you want a holistic picture of how your software is running, Nsight Systems is likely the tool you want to use. If you want to look closely at the details of GPU performance, you might consider Nsight Compute instead.
After loading one of the CUDA modules (e.g., module load CUDA/12.6.0
),
Nsight Systems can be run on the command line with the nsys
command and Compute with the ncu
command.
See NVIDIA's documentation for Systems
and Compute.
There are also videos for Systems
and tutorials for both Systems
and Compute.
Python Support
Nsight Systems and Compute can be used with Python, including support for popular data science and machine learning libraries like Dask and PyTorch. If you use JupyterLab, NVIDIA has created an extension that allows you to use Systems and Compute within JupyterLab.
To use the extension, it's best to have your desired CUDA module loaded alongside your JupyterLab instance. This way the path to the Nsight executables is available. Options for launching a JupyterLab server with a CUDA module loaded include:
- loading CUDA inside JupyterLab by installing the
jupyterlmod
extension as in this Lab Notebook - from a terminal in a Interactive Desktop OnDemand App after loading the CUDA module
- connecting to an existing server in VS Code when that server is launched after the CUDA module has been loaded