Installing Tensorflow

TensorFlow is best installed using pip. However, you can use either a Conda or Python virtual environment to isolate your TensorFlow installation from other packages you may be using.

Installing TensorFlow in a Conda environment

We assume that you've already installed Conda locally (i.e. in your home directory on the HPCC), or are using Conda from the Miniforge3 module on the HPCC.

First off, run ssh dev-amd20-v100 to log into a GPU dev-node.

Note

If you are not familiar with basic conda commands (e.g., conda create/activate/install/deactivate), check out the conda cheatsheet.

Once logged in, run the installation script below in your terminal to complete the GPU-based TensorFlow (TF) installation in your conda environment.

# README
# - Below we are using the Conda with the HPCC module system Miniforge3.
# - If you are using a local Miniforge installation, make sure it is installed in /mnt/home/user123/miniforge3/; change user123 to your real account
# - Additionally, you'll need to load conda first, by following the "Using Conda" tutorial https://docs.icer.msu.edu/Using_conda/

module purge
module load Miniforge3
conda create --name tf_Apr2025 python=3.10 pip
conda activate tf_Apr2025
python -m pip install 'tensorflow[and-cuda]'
conda deactivate

Now we'll run a few simple one-liner commands to verify the installation. Again, you'll need to have the Miniforge3 module loaded before running these commands.

conda activate tf_Apr2025
python -c "import tensorflow as tf; print (tf.__version__)" # check TF version
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" # verify GPU devices
python -c "import tensorrt; print(tensorrt.__version__); assert tensorrt.Builder(tensorrt.Logger())" # test TensorRT installation
conda deactivate

Note

When running these TensorFlow tests on a dev-node you may run into a CUDA_ERROR_OUT_OF_MEMORY error, due to others using the GPU memory. To avoid this, you can run these tests by starting an interactive session on a GPU compute node, where you will be allocated an entire GPU. When starting the interactive session, you may add the constraint --constraint=[amd20|amd22|amd24] to ensure you're running on newer GPUs which can run with the newer CUDA versions.

Installing TensorFlow in a Python Virtual Environment

Again, we'll first need to run ssh dev-amd20-v100 to log into a GPU dev-node.

Once logged into a GPU dev-node, run the installation script below in your terminal to complete the GPU-based TF installation in your Python virtual environment.

module purge 
module load Python/3.11.3-GCCcore-12.3.0 
python -m venv tf_Apr2025 
source tf_Apr2025/bin/activate 
python -m pip install tensorflow[and-cuda]
deactivate

Now we'll run a few simple one-liner commands to verify the installation. Again, you'll need to have the Python/3.11.3-GCCcore-12.3.0 module loaded before running these commands.

source tf_Apr2025/bin/activate 
python -c "import tensorflow as tf; print (tf.__version__)" # check TF version
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" # verify GPU devices
deactivate