While the official TF installation guide provides a one-shot method to install TF by pip install tensorflow[and-cuda] without needing to manually pre-install packages such as CUDA and cuDNN, they have issues with registering cuDNN, cuFFT, cuBLAS, and TensorRT. To resolve these library-related issues, we will choose to install them manually inside a conda environment. To learn more, you can read this blog where they show different results from using the two methods.
First off, run ssh dev-amd20-v100 to log into our GPU dev-node.
Note
If you are not familiar with basic conda commands (e.g., conda create/activate/install/deactivate), check out the conda cheatsheet. After creating a new conda environment (namely tf_Jul2024 below) and activating it, the environment variable $CONDA_PREFIX will be set to /mnt/home/user123/miniforge3/envs/tf_Jul2024/.
Once logged in, run the installation script below in your terminal to complete the GPU-based TF installation in your conda environment.
1 2 3 4 5 6 7 8 91011121314
# README# - Below we assume your miniforge is installed in /mnt/home/user123/miniforge3/; change user123 to your real account# - You need to load conda first, by following the "Using Conda" tutorial https://docs.icer.msu.edu/Using_conda/
condacreate-ntf_Jul2024python=3.10
condaactivatetf_Jul2024
condainstall-cconda-forgecudatoolkit=11.8cudnn=8.8
exportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib
python-mpipinstalltensorrt==8.5.3.1
TENSORRT_PATH=$(dirname$(python-c"import tensorrt; print(tensorrt.__file__)"))echo$TENSORRT_PATH# output is used for composing the next commandexportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt/home/user123/miniforge3/envs/tf_Jul2024/lib/python3.10/site-packages/tensorrt
python-mpipinstalltensorflow==2.13
condadeactivate
Now we'll run a few simple one-liner commands to verify the installation.
1234567
condaactivatetf_Jul2024
exportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib:/mnt/home/user123/miniforge3/envs/tf_Jul2024/lib/python3.10/site-packages/tensorrt
python3-c"import tensorflow as tf; print (tf.__version__)"# check TF version
python3-c"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"# verify GPU devices
python3-c"import tensorrt; print(tensorrt.__version__); assert tensorrt.Builder(tensorrt.Logger())"# test TensorRT installation
condadeactivate