Python on HPC
Python is popular because it makes a great first impression; i) clean, clear syntax, ii) multi-paradigm, iii) interpreted, iv) duck typing, garbage collection, and most of all, v) instant productivity. It keeps up with users' needs. It has i) flexible, full-featured data structures ii) extensive standard libraries iii) reusable open-source package iv) package management tools v) good unit testing frameworks.
In exchange for user-friendliness and ease of use, Python becomes one of the slowest computer languages, primarily because it is an interpreted language, and allows a single thread to run in the interpreter's memory space at once. Python is typically 30 to 300 times slower than C or Fortran. However, Python has a powerful and enthusiastic open-source community which continuously improves the capability of Python.
In this tutorial, we want to
- explain what the MSU HPCC is doing to support Python users.
- provide guidance to help users improve Python performance on the HPCC.
- point out tools that support developers of Python on the HPCC.
We assume that
- you know and use Python, or
- you know and use the HPCC and are curious about using Python in your own HPCC work.
How to use Python on the HPCC?
Python applications usually use packages and modules that require specific version of libraries. This means one installed application may conflict with another application due to using the same library but with different versions. It is difficult to meet the requirements of every application by one global Python installation. To resolve this issue,
- users can create an isolated virtual environment with a particular version of Python on our system in a self-contained directory of their home or research space.
- users can Install their own version of Python through Anaconda in their home or research space. This gives users full control on their preferred versions of python and packages.
Python with virtual environments
More details of how to use virtual environments can be found at this page
Python with Conda
To use Python on the HPCC, you have to load a Python module. A few helpful
module commands would be
module avail Python,
module spider Python,
module load Python. For more information for our
module system can be found here
For Jupyter notebook users, we have the Open OnDemand platform To connect to HPCC OnDemand, visit https://ondemand.hpcc.msu.edu. After logging in, choose interactive apps, select Jupyter Notebook, request resources as you need. Your Jupyter Notebook will start when the requested resources are ready.
Can my Python code be faster?
Now, you are ready to use Python on the HPCC. Now, let's learn a few tips to make your Python codes faster.
Vectorization speeds up Python code without using loops. Instead of loops, NumPy can help by minimizing the running time of code efficiently. NumPy offers various operations to be performed over vectors such as the dot product, cross product, and matrix multiplication. See the Numpy array documentation for more information.
Numba compiles Python codes just in time with a few decorators, without
much modification of code. In addition, Numba offers automatic parallelization which is very
easy to use. You just need to add the one line
@njit(parallel=True). More information can be found here.
Numba also supports NVIDIA CUDA. It is easy to use (at least much easier
to use than other programming languages).
Use Threaded Libraries
Packages like NumPy and SciPy are already built with MPI and multithread support via BLAS/LAPACK, and MKL. In general, it is a plausible guess that most solvers have already been implemented in pure Python. In addition, many major threaded libraries and packages already have binds such as PyTrilinos, Petsc4py, Elemental, and SLEPc. So, don't try to reinvent the wheel. If it is not new, it is probably already implemented for high performance.
Python has a package for MPI, mpi4py.
- a pythonic wrapping of the system's native MPI.
- a provider of almost all MPI-1, 2 and common MPI-3 features.
- very well maintained.
- distributed with major Python distributions.
- portable and scalable.
- dependent only on NumPy, Cython (build only), and MPI libraries.
More information can be found here.
Other Python Resources
The following are a few Python resource links.
- HPCC wiki Python page
- Python video on youtube