Python on HPC

Python is popular because it makes a great first impression; i) clean, clear syntax, ii) multi-paradigm, iii) interpreted, iv) duck typing, garbage collection, and most of all, v) instant productivity. It keeps up with users' needs. It has i) flexible, full-featured data structures ii) extensive standard libraries iii) reusable open-source package iv) package management tools v) good unit testing frameworks.

In exchange for user-friendliness and ease of use, Python becomes one of the slowest computer languages, primarily because it is an interpreted language, and allows a single thread to run in the interpreter's memory space at once. Python is typically 30 to 300 times slower than C or Fortran. However, Python has a powerful and enthusiastic open-source community which continuously improves the capability of Python.

On this page, we want to

explain what the MSU HPCC is doing to support Python users.
provide guidance to help users improve Python performance on the HPCC.
point out tools that support developers of Python on the HPCC.

We assume that

you know and use Python, or
you know and use the HPCC and are curious about using Python in your own HPCC work.

How to use Python on the HPCC?

Python applications usually use packages and modules that require specific version of libraries. This means one installed application may conflict with another application due to using the same library but with different versions. It is difficult to meet the requirements of every application by one global Python installation. To resolve this issue,

users can create an isolated virtual environment with a particular version of Python on our system in a self-contained directory of their home or research space.
users can install their own version of Python through Conda in their home or research space. This gives users full control on their preferred versions of python and packages.

To get started using Python on the HPCC, you have to load a Python module. A few helpful module commands would be module avail Python, module spider Python, and module load Python. More information on our module system can be found here.

Python with virtual environments

More details of how to use virtual environments can be found at this page.

Python with Conda

More details of how to use Conda can be found at this page.

Jupyter Notebook

For Jupyter notebook users, we have the Open OnDemand platform To connect to HPCC OnDemand, visit https://ondemand.hpcc.msu.edu. After logging in, choose interactive apps, select Jupyter Notebook, request resources as you need. Your Jupyter Notebook will start when the requested resources are ready.

A screenshot of the OnDemand Dashboard showing the Interactive Apps menu open.

A screenshot of the OnDemand Jupyter Notebook app settings with 1 hour, 1 core, and 10GB memory requested.

Can my Python code be faster?

Now, you are ready to use Python on the HPCC. Now, let's learn a few tips to make your Python codes faster.

Vectorization

Vectorization speeds up Python code without using loops. Instead of loops, NumPy can help by minimizing the running time of code efficiently. NumPy offers various operations to be performed over vectors such as the dot product, cross product, and matrix multiplication. See the Numpy array documentation for more information.

Numba

Numba compiles Python codes just in time with a few decorators, without much modification of code. In addition, Numba offers automatic parallelization which is very easy to use. You just need to add the one line decorator, @njit(parallel=True). More information can be found here. Numba also supports NVIDIA CUDA. It is easy to use (at least much easier to use than other programming languages).

Use Threaded Libraries

Packages like NumPy and SciPy are already built with MPI and multithread support via BLAS/LAPACK, and MKL. In general, it is a plausible guess that most solvers have already been implemented in pure Python. In addition, many major threaded libraries and packages already have binds such as PyTrilinos, Petsc4py, Elemental, and SLEPc. So, don't try to reinvent the wheel. If it is not new, it is probably already implemented for high performance.

MPI

Python has a package for MPI, mpi4py.

It is

a pythonic wrapping of the system's native MPI.
a provider of almost all MPI-1, 2 and common MPI-3 features.
very well maintained.
distributed with major Python distributions.
portable and scalable.
dependent only on NumPy, Cython (build only), and MPI libraries.

More information can be found here.

Other Python Resources

The following are a few Python resource links.