Skip to content

Warning

TensorFlow requires specific instructions for a fully functional installation. As such, the instructions and recommendations on this page may differ slightly from other pages in ICER's documentation, but have been fully tested as of March 2023. For more general Conda and Python usage, please see our page on Using Conda.

Install TensorFlow using conda

In this tutorial, we will first install Anaconda in our home directory, then install TF in a newly created conda environment, and finally run a few TF commands to verify the installation.

Installing Anaconda in your home directory

A full guide of downloading anaconda and installing it in your home directory is here. Following the guide, below we show a sequence of commands that will download and configure conda in one's home directory on the HPCC (say /mnt/home/user123/).

1
2
3
4
5
6
# Install anaconda3 in /mnt/home/user123/ (replace user123 with your HPCC account name)
wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
bash Anaconda3-2022.10-Linux-x86_64.sh
source /mnt/home/user123/anaconda3/bin/activate
conda init
conda config --set auto_activate_base false

Notes

  • We recommend downloading Anaconda 3 which corresponds to Python 3.

  • In the guide, step 8, it says Anaconda recommends entering "yes". However, we recommend a "No" so as to not modify your ~/.bashrc. After that, you will need to run source /mnt/home/user123/anaconda3/bin/activate and conda init as shown above.

  • The last command above is to disable automatic base environment activation. This is necessary.

  • By default, your anaconda will be installed in /mnt/home/user123/anaconda3/. You can specify an alternate installation path during this interactive process.

  • Above, the link after wget can be replaced by a more recent version of script in https://repo.anaconda.com/archive/

  • If you encounter any errors, check your quota first, by running quota. Make sure your home directory has enough space. Always fully delete previously installed anaconda if you are going to re-install by repeating the steps.

Installing TF in a conda environment

After you've successfully installed Anaconda in your home directory, you can follow the commands below to install TF and troubleshoot some errors. After initial login, run ssh dev-amd20-v100 to log into our GPU dev-node.

Warning

Installing TensorFlow while on dev-amd20-v100 will restrict you to amd20 nodes with GPUs. You must specify amd20 as a constraint when submitting a batch job or starting an OnDemand session.

If you are not familiar with basic conda commands (e.g., conda create/activate/install/deactivate), check out this conda cheatsheet. After creating a new conda environment (namely tf_gpu_Feb2023 below) and activating it, the environment variable $CONDA_PREFIX will point to /mnt/home/user123/anaconda3/envs/tf_gpu_Feb2023.

Minimally, you only need to modify the first line below, that is, export PATH=..., so that the path points to the bin folder in your anaconda installation. The rest of the commands can be directly copied and run in your terminal.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
export PATH=/mnt/home/user123/anaconda3/bin:$PATH
conda create --name tf_gpu_Feb2023 python=3.9
conda activate tf_gpu_Feb2023

conda install -c conda-forge cudnn=8.1.0 --yes
conda install -c nvidia cuda-nvcc --yes 

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
pip install --upgrade pip
pip install tensorflow==2.11.0 # compatible with CuDNN v8.1.0
pip install nvidia-pyindex
pip install nvidia-tensorrt

# To fix the error of "Could not load dynamic library 'libnvinfer.so.7'". The trick is to create a symlink.
cd $CONDA_PREFIX/lib/python3.9/site-packages/tensorrt_libs
ln -s libnvinfer.so.8 libnvinfer.so.7
ln -s libnvinfer_plugin.so.8 libnvinfer_plugin.so.7

# To fix the error of "Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice." 
mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice

conda deactivate

Verifying the installation using simple commands

Now we'll run a few one-liners to test out, right from the shell command line. Again, you need to be logged onto dev-amd20-v100 the GPU dev-node. If no errors pop up when executing these commands, you should be all set.

Note

You'll need to run the first four lines every time you want to start using TensorFlow. This includes any SLURM scripts you write to launch TF jobs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
export PATH=/mnt/home/user123/anaconda3/bin:$PATH
conda activate tf_gpu_Feb2023

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/:/lib64/:$CONDA_PREFIX/lib/:$CONDA_PREFIX/lib/python3.9/site-packages/tensorrt_libs
export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib

# Simple one-liner test commands
python3 -c "import tensorflow as tf; print (tf.__version__)" # check TF version
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" # verify GPU devices
python3 -c "import tensorrt; print(tensorrt.__version__); assert tensorrt.Builder(tensorrt.Logger())" # test TensorRT installation

conda deactivate

More complicated testing code can be found in our TensorFlow model training examples.

Using TensorFlow in an OnDemand Jupyter notebook

If you would like to use TensorFlow from an Open OnDemand Jupyter notebook, you'll first need to install Jupyter.

1
2
3
4
export PATH=/mnt/home/user123/anaconda3/bin:$PATH
conda activate tf_gpu_Feb2023

conda install jupyter

Then, you need to edit a particular file to set up the LD_LIBRARY_PATH and XLA_FLAGS environment variables in the same way they are set above. First, we'll make a backup of this file as demonstrated below.

1
2
3
cd $CONDA_PREFIX/share/jupyter/kernels/python3/

cp kernel.json kernel.json.bak

With your favorite text editor, open kernel.json. Look for the following pattern at the end of the file

1
2
 }
}
and add the following (note the commas!)
1
2
3
4
5
6
 },
 "env": {
  "XLA_FLAGS":"--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib",
  "LD_LIBRARY_PATH":"$LD_LIBRARY_PATH:/lib/:/lib64/:$CONDA_PREFIX/lib/:$CONDA_PREFIX/lib/python3.9/site-packages/tensorrt_libs"
 }
}

Note

If you open a notebook and get a message about no kernel being available, make sure you added the comma after the first curly bracket.

When you request a Jupyter notebook through OnDemand, make sure to do the following:

  • Request more than the minimum amount of memory (on the order of GB)
  • Select "Launch Jupyter Notebook using the Anaconda installation in my home directory"
  • Enter the full path to your Anaconda installation; e.g, /mnt/home/user123/anaconda3
  • Enter the name of your TF Conda environment; e.g., tf_gpu_Feb2023
  • Select "Advanced Options"
  • Set the node type to amd20
  • Request 1-4 GPUs

Even if you have requested less than 4 hours of wall time, your job may spend more time in the queue than you may used to. This is normal given the specific resources we have requested.

You can test that TensorFlow will run in your notebook by running the following:

1
2
3
4
5
6
7
8
import tensorflow as tf
import tensorrt

print("TF Version:", tf.__version__)
print("GPUs:\n", tf.config.list_physical_devices('GPU'))
print("TensorRT Version:", tensorrt.__version__)

assert tensorrt.Builder(tensorrt.Logger())