Skip to content

The HPCC's GPU Resources

The HPCC offers several generations of GPUs as noted in the general Cluster Resources page. More information about these devices is provided in the table below. The cluster types that each GPU is associated with correspond to the cluster types listed in the Cluster Resources table.

GPU Cluster Type Number per Node GPU Memory Architecture Compute Capability Connection Type NVLink
a100 amd21* 4 81920 MB Ampere 8.0 SXM Yes
intel21 4 40960 MB Ampere 8.0 PCIe No
v100 amd20 4 32768 MB Volta 7.0 PCIe Mixed
intel18 8 32768 MB Volta 7.0 PCIe Yes
k80 intel16 8 12206 MB Kepler 3.7 PCIe No
k20 intel14 2 4743 MB Kepler 3.5 PCIe No

*The amd21 cluster contains some nodes that belong the to Data Machine.

Architecture & Compute Capability

Currently, all of the HPCC's GPUs are manufactured by NVIDIA. They are designed following multiple architectures. Knowing a GPU's architecture aids in researching their technical specifications. A GPU's architecture is abbreviated in it's name; for example, the V100 GPUs follow the Volta architecture.

Specific architectures and models of GPUs are able to meet certain compute capabilities (CC): sets of features that applications can leverage when executing on that GPU. Newer GPUs offer more advanced features and therefore adhere to a newer version of NVIDIA's compute capabilities. An explanation of what features are available for each compute capability can be found on both as part of the CUDA documentation (CC > 5.0 only) and compiled on Wikipedia.

Developers may use the CUDA programming language to utilize our GPUs in their software applications. See our page on Compiling for GPUs for more information on which versions of CUDA may be used for each of the HPCC's GPUs and their respective compute capabilities.

Connection Type

Most of the HPCC's GPUs communicate with the CPUs of their host node via the PCIe (Peripheral Component Interconnect Express) bus. This bus is the primary channel by which data and instructions are transferred to and from the GPU. As such, the speed of this bus can affect the speed of GPU applications where large amounts of data transfer are a concern. In contrast, the A100 GPUs associated with the amd21 clusters are connected using SXM (Server PCI Express Module) sockets which offer higher connection speeds. Research the specifications of the particular GPU you are planning to use to learn more specifics about their bus's bandwidth.

While PCIe and SXM refer to the connection between the CPU and GPU, some of the HPCC's V100 and A100 GPUs are also connected to each other using NVIDIA's NVLink technology. NVLink allows GPUs to directly share data with each other. Without NVLink, transferring data from one GPU to another would require that the data first pass through the CPU. Using the CPU as a data transfer "middleman" adds to overall time the transfer takes and may also delay the CPU from communicating additional data and instructions to the GPUs. If you plan to use multiple GPUs for your job, consider requesting resources that support NVLink as indicated in the table above.

Some of the amd20 nodes support NVLink while others do not. You can check whether or not a given node supports NVLink by requesting a job on that node and connecting to it. Specific nodes can be requested with the -w or --nodelist option; see the list of job specifications for more. Then, once connected, run nvidia-smi nvlink -s to check the status of the node's NVLink connection.