The HPCC's GPU Resources

The HPCC offers several generations of GPUs as noted in the general Cluster Resources page. More information about these devices is provided in the table below. The cluster types that each GPU is associated with correspond to the cluster types listed in the Cluster Resources table.

GPU	Cluster Type	Number per Node	GPU Memory	Architecture	Compute Capability	Maximum Precision	Connection Type	NVLink
`h200`	amd24	4 or 8	141 GB	Hopper	9.0	Double (64-bit)	SXM	Yes
`l40s`	amd24	8	48 GB	Ada Lovelace	8.9	Single (32-bit)	PCIe	No
`a100`	amd21*	4	81920 MB	Ampere	8.0	Double (64-bit)	SXM	Yes
	intel21	4	40960 MB	Ampere	8.0	Double (64-bit)	PCIe	No
`v100s`	amd20	4	32768 MB	Volta	7.0	Double (64-bit)	PCIe	Mixed
`v100`	intel18	8	32768 MB	Volta	7.0	Double (64-bit)	PCIe	Yes
`k80`	intel16	8	12206 MB	Kepler	3.7	Double (64-bit)	PCIe	No

*The amd21 cluster contains some nodes that belong to the Data Machine.

Architecture & Compute Capability

Currently, all of the HPCC's GPUs are manufactured by NVIDIA. They are designed following multiple architectures. Knowing a GPU's architecture aids in researching their technical specifications. A GPU's architecture is abbreviated in it's name; for example, the V100 GPUs follow the Volta architecture.

Specific architectures and models of GPUs are able to meet certain compute capabilities (CC): sets of features that applications can leverage when executing on that GPU. Newer GPUs offer more advanced features and therefore adhere to a newer version of NVIDIA's compute capabilities. An explanation of what features are available for each compute capability can be found on both as part of the CUDA documentation (CC > 5.0 only) and compiled on Wikipedia.

Developers may use the CUDA programming language to utilize our GPUs in their software applications. See our page on Compiling for GPUs for more information on which versions of CUDA may be used for each of the HPCC's GPUs and their respective compute capabilities.

Maximum Precision

Real numbers are represented in binary following the IEEE floating point specifications. Numbers can be represented using different numbers of bits;more bits offer more precision at the cost of using more memory. Most scientific calculations use "double precision" or 64 bits as the standard. Real numbers represented with 64 bits are accurate to about about 15 decimal places, with the exact accuracy depending on the number being represented. Sometimes calculations do not need this level of precision and can make use of less expensive GPU hardware that is only capable of performing "single precision" or 32 bit calculations. Be aware that real numbers represented with 32 bits are only accurate to about 7 decimal places.

Connection Type

Most of the HPCC's GPUs communicate with the CPUs of their host node via the PCIe (Peripheral Component Interconnect Express) bus. This bus is the primary channel by which data and instructions are transferred to and from the GPU. As such, the speed of this bus can affect the speed of GPU applications where large amounts of data transfer are a concern. In contrast, the A100 GPUs associated with the amd21 clusters are connected using SXM (Server PCI Express Module) sockets which offer higher connection speeds. Research the specifications of the particular GPU you are planning to use to learn more specifics about their bus's bandwidth.

NVLink

While PCIe and SXM refer to the connection between the CPU and GPU, some of the HPCC's V100 and A100 GPUs are also connected to each other using NVIDIA's NVLink technology. NVLink allows GPUs to efficiently share data with each other. Your program will automatically use NVLink if your code has been written to use peer-to-peer (GPU-to-GPU) communication; for example, PyTorch and Tensorflow.

If you plan to use multiple GPUs for your job, consider requesting resources that support NVLink as indicated in the table above. Some of the amd20 nodes support NVLink while others do not. You can check whether or not a given node supports NVLink by requesting a job on that node and connecting to it. Specific nodes can be requested with the -w or --nodelist option; see the list of job specifications for more. Then, once connected, run nvidia-smi nvlink -s to check the status of the node's NVLink connection.

Some nodes without NVLink still support peer-to-peer GPU communication. The nvidia-smi topo -m command can be used to identify the connection types between GPUs.