Warning
This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.
Common Machine Learning Tools (TensorFlow, Keras, scikit-learn, PyTorch) on OnDemand
This lab notebook discusses the installation and usage of common machine learning tools (TensorFlow, Keras, scikit-learn, and PyTorch) in Jupyter Notebooks through the HPCC's OnDemand interface. It is mostly oriented towards students in the fall 2024 section of CMSE 492-001 or CMSE 802-001 using the Data Machine, but will hopefully be useful to a general audience.
Usage instructions
We recommend using a Conda environment including the above tools through a Jupyter notebook. To do so, visit ICER's OnDemand portal and click the Jupyter app.
Enter resource request
In the settings, enter the time, cores, and memory you would like.
When using a GPU slice on the Data Machine as recommended below, it is best to use 4
as the "Number of cores per task" and 18GB
as the "Amount of memory". This ensures that a single node can be split equally 28 ways. However, if you need more resources, please feel free to ask for them with the understanding that your job may take longer to queue.
Setup Conda environment
Under "Jupyter Location", choose "Conda Environment using Miniforge3 module". You now have two options:
Use a preinstalled Conda environment
Note
This section applies only to students in the fall 2024 section of CMSE 492-001 or CMSE 802-001.
If you are in the fall 2024 section of CMSE 492-001, in the "Conda Environment name or path" field, use /mnt/research/CMSE_492_FS24_S001/envs/ml
. If you are in the fall 2024 section of CMSE 802-001, in the "Conda Environment name or path" field, use /mnt/research/CMSE_802_FS24_S001/envs/ml
. Otherwise, follow the Use your own Conda environment instructions.
Use your own Conda environment
Follow the setup instructions below. In the "Conda Environment name or path" field, use ml
(or whatever you named your Conda environment).
Run using a GPU on the Data Machine
Note
This section only applies to users who have Data Machine access.
Click the "Advanced Options" checkbox. In the "Number of GPUs" field, enter a100_slice
. This reserves a slice of the Data Machine A100 GPUs with 10GB of GPU memory. In the "SLURM Account" field, enter data-machine
.
Launch Jupyter
Press the "Launch" button at the bottom and wait for the job to queue and then for Jupyter to start up. This can take a couple of minutes. When Jupyter is ready, click the "Connect to Jupyter" button.
Appendix: Setup instructions
Use these instructions to setup your own Conda environment. You can make any customizations you like to better fit your workflow.
Login
The first step is to login and make sure that you are on a development node with a GPU. This specific tutorial uses dev-amd20-v100
because it has a newer GPU more in line with the GPUs found on the Data Machine. Most importantly, they are both compatible with CUDA 12 whereas the k20 and k80 GPUs are not.
1 2 |
|
Install packages
The next step is to get access to Conda. Using the Miniforge3
module, create a new environment and install the required packages from NVIDIA. Note that even though all of the packages are installed using pip
, we still create a Conda environment as this works best with OnDemand.
1 2 3 4 5 6 7 8 9 10 11 |
|