Warning

This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.

Using PyTorch (and friends) on OnDemand

This documentation shows two techniques for accessing PyTorch in a Jupyter notebook on the HPCC and is oriented towards shared, class usage. The first assumes that there is no shared "research" space that all participants can access and is ready to go as is. The second requires a shared research space (using $RESEARCH_SPACE as a placeholder) that participants are added to, and requires a one-time setup (see the Appendix).

Option 1: Using the HPCC module system

Start your OnDemand session

Log into OnDemand.
Click the "Interactive Apps" dropdown and choose "Jupyter".
Select the desired "Number of hours", "Number of cores per task", and "Amount of memory". The following are suggested values to mimic a laptop (or to use a small slice of the Data Machine if that is desired):
- Number of hours: 1
- Number of cores per task: 4
- Amount of memory: 18GB
Leave the JupyterLab box checked.
Choose the "Default" location for "Jupyter Location".

(Optional) Using the Data Machine

If you would like to use the Data Machine (optionally, with a GPU), follow these steps

Click the "Advanced Options" checkbox.
(Optional) Under "Number of GPUs", enter a100_slice to get a Data Machine GPU slice.
Under "SLURM Account" enter data-machine.

Click "Launch" at the bottom. Your request will queue, and when ready, the "job card" will change to show a button that says "Connect to Jupyter". Click this button to access your Jupyter session.

Load additional software modules

Do this before opening a Jupyter notebook

Make sure to follow these steps before you open a Jupyter notebook. Otherwise, they will not take affect until you restart the notebook's kernel.

When a session starts, you will have access to our "Default" set of software modules that provide Python, a few helpful packages (like SciPy and numpy) as well as all dependencies. However, to access additional Python packages, you will need to load them yourself.

On the left-hand side, you will five tabs. Click the lowest one that looks like a cube with the center removed. This will show you all software modules loaded on the HPCC.

Note that because the default version of CUDA, CUDA/12.3.0 conflicts with the version used later, please first unload it by finding it in the "LOADED MODULES" section and clicking "Unload"

The non-default modules that need to be loaded are:

matplotlib/3.7.2-gfbf-2023a
scikit-learn/1.3.1-gfbf-2023a
torchvision/0.16.0-foss-2023a-CUDA-12.1.1

Enter the first few characters of each in the "Filter available modules..." box at the top, select the correct version, and choose "Load".

Torchvision warning

When loading the torchvision module, you will see the following warning

WARNING: This installation of PyTorch fails a very small percentage of the official test suite. While ICER investigates please use this module at your own risk. Or consider installing your own copy: https://docs.icer.msu.edu/Installing_pytorch_using_anaconda/

Please be aware of this warning, but note that PyTorch works normally in nearly all circumstances seen by ICER.

Open your Jupyter notebook

Going back to the left-hand side, click the top tab that looks like a file. Navigate to where your Jupyter notebook is stored. Double click to open, and begin running your code!

Option 2: Using a Conda environment

Recall that $RESEARCH_SPACE is a placeholder here for the location of the research space containing the conda environment. Please replace with the desired location.

Start your OnDemand session

Log into OnDemand.
Click the "Interactive Apps" dropdown and choose "Jupyter".
Select the desired "Number of hours", "Number of cores per task", and "Amount of memory". The following are suggested values to mimic a laptop (or to use a small slice of the Data Machine if that is desired):
- Number of hours: 1
- Number of cores per task: 4
- Amount of memory: 18GB
Leave the JupyterLab box checked.
Choose the "Conda Environment using Miniforge3 module" location for "Jupyter Location".
In the "Conda Environment name or path" box, use $RESEARCH_SPACE/envs/pytorch.

(Optional) Using the Data Machine

If you would like to use the Data Machine (optionally, with a GPU), follow these steps

Click the "Advanced Options" checkbox.
(Optional) Under "Number of GPUs", enter a100_slice to get a Data Machine GPU slice.
Under "SLURM Account" enter data-machine.

Click "Launch" at the bottom. Your request will queue, and when ready, the "job card" will change to show a button that says "Connect to Jupyter". Click this button to access your Jupyter session.

Open your Jupyter notebook

Navigate to where your Jupyter notebook is stored. Double click to open, and begin running your code!

Appendix: Conda environment setup instructions

Make sure you are on the v100 development node:

ssh dev-amd20-v100

Then, get access to the conda command using the Miniforge3 module:

module purge
module load Miniforge3

Then create the Conda environment in a convenient location (in this case, a research space):

conda create -p $RESEARCH_SPACE/envs/pytorch

Then activate and install packages

conda activate $RESEARCH_SPACE/envs/pytorch
conda install matplotlib pandas scikit-learn pytorch torchvision torchaudio pytorch-cuda=12.1 -c conda-forge -c pytorch -c nvidia
conda install jupyter  # jupyter prefers to be installed separately

It is highly recommended to also install the jupyter-lmod plugin to interface with the module system from inside Jupyter (e.g., to change the version of CUDA):

python -m pip install jupyterlmod