Warning
This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.
cuQuantum Installation and Usage
This lab notebook discusses the installation and usage of the cuQuantum software development kit (SDK), and in particular, the cuTensorNet library. It is mostly oriented towards students in the fall 2024 section of CMSE 890-001 using the Data Machine, but will hopefully be useful to a general audience.
Usage instructions
We recommend using a Conda environment with cuTensorNet
installed through a Jupyter notebook. To do so, visit ICER's OnDemand portal and click the Jupyter app.
Enter resource request
In the settings, enter the time, cores, and memory you would like.
When using a GPU slice on the Data Machine as recommended below, it is best to use 4
as the "Number of cores per task" and 18GB
as the "Amount of memory". This ensures that a single node can be split equally 28 ways. However, if you need more resources, please feel free to ask for them with the understanding that your job may take longer to queue.
Setup Conda environment
Under "Jupyter Location", choose "Conda Environment using Miniforge3 module". You now have two options:
Use a preinstalled Conda environment
Note
This section applies only to students in the fall 2024 section of CMSE 890-001.
If you are in the fall 2024 section of CMSE 890-001, in the "Conda Environment name or path" field, use /mnt/research/CMSE890_FS24_S001/envs/cuquantum
. Otherwise, follow the Use your own Conda environment instructions.
Use your own Conda environment
Follow the setup instructions below. In the "Conda Environment name or path" field, use cuquantum
(or whatever you named your Conda environment).
Run using a GPU on the Data Machine
Note
This section only applies to users who have Data Machine access.
Click the "Advanced Options" checkbox. In the "Number of GPUs" field, enter a100_slice
. This reserves a slice of the Data Machine A100 GPUs with 10GB of GPU memory. In the "SLURM Account" field, enter data-machine
.
Launch Jupyter
Press the "Launch" button at the bottom and wait for the job to queue and then for Jupyter to start up. This can take a couple of minutes. When Jupyter is ready, click the "Connect to Jupyter" button.
Running a sample program
NVIDIA provides an example of using the Python API for cuTensorNet on their technical blog. You can create a new Jupyter notebook, copy and paste this code into a cell, and run the cell. You should get a similar FLOP count of the optimized contraction path.
For more examples, see NVIDIA's cuQuantum Python documentation.
Compiling a sample CUDA code
The Conda environment also includes libraries to compile C++ based CUDA code for similar types of calculation with much more flexibility (and complexity). In order to use these libraries, you need to use the nvcc
compiler which comes with the CUDA
module on the HPCC. For more details about compiling CUDA programs, please see our documentation on Compiling for GPUs.
These commands will be run from the command line. You can open one in your Jupter Lab instance by opening a new tab and clicking terminal. Or you can submit an interactive job from an SSH session if you prefer.
In this example, we'll use the cuTensorNet contraction example provided by NVIDIA that performs similar calculations to the above referenced Python code. This can be downloaded to the HPCC with a command like
1 |
|
To compile this code, you first need to make sure CUDA is loaded and your conda environment is activated.
1 2 3 4 5 |
|
To compile the code, you need to point the nvcc
compiler to the headers and libraries using the -I
and -L
flags. These are stored in ${CUQUANTUM_ROOT}/include
and ${CUQUANTUM_ROOT}/lib
respectively. This example uses the cutensornet
and cutensor
libraries, brought in using the -l
flag. Thus, you can compile the example with the command
1 |
|
You can then run the executable with
1 |
|
Make sure CUDA is loaded and libraries are on your LD_LIBRARY_PATH
To run your code, it is important that the exectuable can find the libraries it needs at runtime. Make sure to always run your code after loading the CUDA module and activating your conda environment with
1 2 3 4 5 |
|
In particular, the libraries you used to compile need to be added to the LD_LIBRARY_PATH
environment variable which is done by setting the extra environment variables in your Conda environment setup.
Optionally, if you skipped this step, you can also set your LD_LIBRARY_PATH
manually with
1 |
|
You will need to do this everytime you start a new shell before running your code.
Appendix: Further reading
After experimenting with the samples above, you may be interested in the following:
- Requesting multiple GPUs and chaining them together with MPI. Note that you can request multiple Data Machine GPU slices using
a100_slice:n
wheren
is the number of slices you would like to use. - Using whole GPUs in the Data Machine instead of slices (subject to availability).
- Exploring the different types of Python bindings provided by the cuQuantum Python package.
- Compile and run more cuTensorNet examples
- Run more cuTensorNet Python examples
Appendix: Setup instructions
Use these instructions to setup your own Conda environment. You can make any customizations you like to better fit your workflow.
Login
The first step is to login and make sure that you are on a development node with a GPU. This specific tutorial uses dev-amd20-v100
because it has a newer GPU more in line with the GPUs found on the Data Machine. Most importantly, they are both compatible with CUDA 12 whereas the k20 and k80 GPUs are not.
1 2 |
|
Install packages
The next step is to get access to Conda. Using the Miniforge3
module, create a new environment and install the required packages from NVIDIA.
1 2 3 4 5 6 |
|
Set some extra environment variables
To make the most use of cuQuantum for CUDA compilation and native MPI support in cuTensorNet, you can set a few variables when you activate your Conda environment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|