Warning
This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.
Using PyTorch (and friends) on OnDemand
This documentation shows two techniques for accessing PyTorch in a Jupyter notebook on the HPCC and is oriented towards shared, class usage. The first assumes that there is no shared "research" space that all participants can access and is ready to go as is. The second requires a shared research space (using $RESEARCH_SPACE
as a placeholder) that participants are added to, and requires a one-time setup (see the Appendix).
Option 1: Using the HPCC module system
Start your OnDemand session
- Log into OnDemand.
- Click the "Interactive Apps" dropdown and choose "Jupyter".
-
Select the desired "Number of hours", "Number of cores per task", and "Amount of memory". The following are suggested values to mimic a laptop (or to use a small slice of the Data Machine if that is desired):
- Number of hours: 1
- Number of cores per task: 4
- Amount of memory: 18GB
-
Leave the JupyterLab box checked.
- Choose the "Default" location for "Jupyter Location".
(Optional) Using the Data Machine
If you would like to use the Data Machine (optionally, with a GPU), follow these steps
- Click the "Advanced Options" checkbox.
- (Optional) Under "Number of GPUs", enter
a100_slice
to get a Data Machine GPU slice. - Under "SLURM Account" enter
data-machine
.
Click "Launch" at the bottom. Your request will queue, and when ready, the "job card" will change to show a button that says "Connect to Jupyter". Click this button to access your Jupyter session.
Load additional software modules
Do this before opening a Jupyter notebook
Make sure to follow these steps before you open a Jupyter notebook. Otherwise, they will not take affect until you restart the notebook's kernel.
When a session starts, you will have access to our "Default" set of software modules that provide Python, a few helpful packages (like SciPy and numpy) as well as all dependencies. However, to access additional Python packages, you will need to load them yourself.
On the left-hand side, you will five tabs. Click the lowest one that looks like a cube with the center removed. This will show you all software modules loaded on the HPCC.
Note that because the default version of CUDA, CUDA/12.3.0
conflicts with the version used later, please first unload it by finding it in the "LOADED MODULES" section and clicking "Unload"
The non-default modules that need to be loaded are:
matplotlib/3.7.2-gfbf-2023a
scikit-learn/1.3.1-gfbf-2023a
torchvision/0.16.0-foss-2023a-CUDA-12.1.1
Enter the first few characters of each in the "Filter available modules..." box at the top, select the correct version, and choose "Load".
Torchvision warning
When loading the torchvision
module, you will see the following warning
1 |
|
Please be aware of this warning, but note that PyTorch works normally in nearly all circumstances seen by ICER.
Open your Jupyter notebook
Going back to the left-hand side, click the top tab that looks like a file. Navigate to where your Jupyter notebook is stored. Double click to open, and begin running your code!
Option 2: Using a Conda environment
Recall that $RESEARCH_SPACE
is a placeholder here for the location of the research space containing the conda environment. Please replace with the desired location.
Start your OnDemand session
- Log into OnDemand.
- Click the "Interactive Apps" dropdown and choose "Jupyter".
-
Select the desired "Number of hours", "Number of cores per task", and "Amount of memory". The following are suggested values to mimic a laptop (or to use a small slice of the Data Machine if that is desired):
- Number of hours: 1
- Number of cores per task: 4
- Amount of memory: 18GB
-
Leave the JupyterLab box checked.
- Choose the "Conda Environment using Miniforge3 module" location for "Jupyter Location".
- In the "Conda Environment name or path" box, use
$RESEARCH_SPACE/envs/pytorch
.
(Optional) Using the Data Machine
If you would like to use the Data Machine (optionally, with a GPU), follow these steps
- Click the "Advanced Options" checkbox.
- (Optional) Under "Number of GPUs", enter
a100_slice
to get a Data Machine GPU slice. - Under "SLURM Account" enter
data-machine
.
Click "Launch" at the bottom. Your request will queue, and when ready, the "job card" will change to show a button that says "Connect to Jupyter". Click this button to access your Jupyter session.
Open your Jupyter notebook
Navigate to where your Jupyter notebook is stored. Double click to open, and begin running your code!
Appendix: Conda environment setup instructions
Make sure you are on the v100
development node:
1 |
|
Then, get access to the conda
command using the Miniforge3
module:
1 2 |
|
Then create the Conda environment in a convenient location (in this case, a research space):
1 |
|
Then activate and install packages
1 2 3 |
|
It is highly recommended to also install the jupyter-lmod plugin to interface with the module system from inside Jupyter (e.g., to change the version of CUDA):
1 |
|