Skip to content

Warning

This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.

Singularity Overlays (2022-09-24)

Singularity is a versatile tool to give researchers more flexibility installing software and running their workflows on the HPC. Most workflows don't need Singularity but it can be extremely helpful to solve certain weridly difficult problems. Some common examples for researchers using singularity on the HPC include:

  • Installing software that needs a special/different base operating system.
  • Installing software that requires administrator privileges (aka root, su and/or sudo).
  • Installing complex dependency trees (like python and R)
  • Using existing software inside a pre-built virtual machine.
  • Working with lots of tiny files on the HPC filesystems which are better designed for smaller numbers of big files.
  • Building workflows that can easily move between different resources.

NOTE This overview is specific to the High Performance Computing Center (HPCC) at Michigan State University (MSU). For a complete tutorial see the Singularity documentation. This overview assumes that you have an HPCC account and know how to navigate to and use a development node.

The remainder of this overview will walk you through the first steps in using Singularity. However, if you want to skip the details just try running the following three powertools commands. The first two will create a read/writable overlay and install miniforge. The third one will start this overlay inside a CentOS singularity image. You will only need to run the first two commands once to build the overlay file (with conda), then you can just use the third command anytime you want to start the overlay:

1
2
3
overlay_build
overlay_install_conda
overlay_start

To exit singularity just type exit.

Step 1: Get a singularity image

As a starting point we need a singularity image, also known as a container or virtual machine. You can think of a singularity image as a "software hard drive" that contains an entire operating system in a file. There are three main ways to get these images:

  1. Use one of the Singularity images already on the HPCC.
  2. Download an image form one of the many online libraries.
  3. Build your own image.

If you don't know which one of the above to use, I recommend that you pick number 1 and just use the singularity image we already have on the system.

1. Use one of the Singularity images already on the HPCC

For this introduction, we can keep things simple and just use one of the Singularity images already on the HPCC. This image runs CentOS 7 Linux and is a good starting point. Use the following command to start singularity in a "shell" using the provided image:

1
singularity shell --env TERM=vt100 /opt/software/CentOS.container/7.4/bin/centos

Once you run this command you should see the "Singularity" prompt which will look something like the following:

1
Singularity>

You did it! You are running a different operating system (OS) than the base operating system. All of the main HPCC folders are still accessible from this "container" (ex. /mnt/home, /mnt/research, /mnt/scratch/, etc) so it shouldn't look much different than before (except for the different prompt and you no longer have access to some of the base HPCC software).

At this point, if you know what you need, you should be able use files in your home directory and it will compile/run using the singularity OS instead of the base OS.

NOTE: You can just install software in your /mnt/home/$USER and/or /mnt/research folders. The software you install will probably only work from "inside" this singularity image. However, you will also be able to see and manipulate the files from within your standard HPC account. This is fine for many researchers but I recommend you jump down to "Step 3: Overlays" to make Singularity even more flexible.

2. Download an image form one of the many online libraries

Many people publish singularity images and post them on public "libraries" for easy install. Here is a list of online libraries you can browse (this section of the tutorial may need more work, not all of these may work on the HPCC):

Sylabs Library

Link to Browse Sylabs
example:

1
2
3
singularity pull alpine.sif library://alpine:latest

singularity shell alpine.sif

Docker Hub

Link to Browse Docker Hub
example:

1
2
3
singularity pull tensorflow.sif docker://tensorflow/tensorflow:latest

singularity shell tensorflow.sif

Singularity Hub (aka shub)

Link to Browse Singularity Hub
example:

1
2
3
singularity pull shub_image.sif shub://vsoch/singularity-images

singularity shell shub_image.sif

3. Build your own image

This one is more complex and outside the scope of the overview. However, if you are interested I recommend you try using the build command with a Docker image since it is fairly easy to install on your personal computer. Here is a link to how to use docker to make a singularity image:

Step 2: Running commands in Singularity

In Step 1 we showed you how to start a singularity "shell". You can also just "execute" a command inside the singularity image and return the results. For example, to run

1
singularity exec /opt/software/CentOS.container/7.4/bin/centos <<COMMAND>>

Where you replace <<COMMAND>> with whatever command you need to run. This option will become very helpful when you want to run singularity inside a submission script "See Step 4" below.

For example, the df -hT command will report file system disk space usage. So running the df -hT will give a different result running inside or outside a singularity image. You can test this using the following commands:

1
2
3
df -hT

singularity exec /opt/software/CentOS.container/7.4/bin/centos df -hT

Step 3: Overlays

One problem we often encounter on the HPCC is "lots-of-small-files" (hundreds of files where each one is < 50MB). The filesystem is optimized for large files. Lots of small files end up "clogging" things up which can slow things down for everyone. One useful trick of singularity is you can make a single large file called an "Overlay" which can be attached to a singularity session. You can use an Overlay as a "filesystem inside a single file" where you can store lots of the small files inside the single overlay file. From the user point of view you can have as many small files as you want accessible from the singularity image (within reasonable limits). However, these small files act as a single file from the HPCC point of view and doesn't clog things up.

This technique is really helpful if you are using complex software installs such as lots of Python, R or Conda installs. It can also be helpful if your research data is lots of small files.

Make your overlay file

Making an overlay is not hard but takes multiple steps. For details on how to make an overlay we recommend viewing the singularity overlay documentation.

Fortunately the HPCC has a "powertool" that can make a basic overlay for you. All you need to do is run the following command:

1
overlay_build

This overlay can be applied to a singularity image using the --overlay option as follows:

1
singularity shell --overlay overlay.img --env TERM=vt100 /opt/software/CentOS.container/7.4/bin/centos

If you have an overlay called overlay.img in your current directory you can use the following powertool shortcut to run it inside the CentOS image:

1
overlay_start

You can also view the amount of filespace available on an overlay (using the df -hT command we used above) by using the following powertool:

1
overlay_size

Writing to your overlay

Once you are in the singularity shell you can now write to the overlay as if you were adding files to the "root" directory (/). For example, running the following commands from inside of your singularity image should allow you to install miniforge3:

1
2
3
wget https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-x86_64.sh

./Miniforge3-24.3.0-0-Linux-x86_64.sh -b -p /miniconda3/

Since we install miniforge3 a lot there is yet another powertool that will do this installation for you. Just run the following command:

1
overlay_install_conda

Once miniforge is installed in the /miniforge3/ directory you need to add the folder /miniforge3/ to the path with the following command:

1
2
3
export PATH=/miniforge3/bin:$PATH

conda --init

Or, just use the powertool from before and it will automatically add /miniforge3 to your path:

1
overlay_start

At this point you can use pip and conda installs as much as you like. These generate hundreds of small files but it doesn't matter because everything will be stored in the overlay.img file as one big file.

To exit singularity just type exit. To start your overlay image just type overlay_start

Step 4: Submitting Jobs

Once we have our image and our conda overlay working in a development node we can execute a script inside of the singularity image, "in batch mode" using the exec from above. For example, this script uses our miniforge installed overlay and runs the python3 script called "mypython.py" which is stored in my home directory on the HPCC.

1
singularity exec --overlay overlay.img /opt/software/CentOS.container/7.4/bin/centos python3 /mnt/home/$USER/mypython.py

Once the above is running on a development node we can just submit this as a job to the HPCC using the following submissions script:

1
2
3
4
5
6
7
#!/bin/bash
#SBATCH --walltime=04:00:00
#SBATCH --mem=5gb
#SBATCH -c 1
#SBATCH -N 1

singularity exec --overlay overlay.img --env PATH=/miniforge3/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/sysbin/ /opt/software/CentOS.container/7.4/bin/centos python3 /mnt/home/$USER/mypython.py

Again, we have a powertool to help clean this up for common workflows. Using the overlay_exec command you can simply the above submission script using the following:

1
2
3
4
5
6
7
#!/bin/bash
#SBATCH --walltime=04:00:00
#SBATCH --mem=5gb
#SBATCH -c 1
#SBATCH -N 1

overlay_exec python3 /mnt/home/$USER/mypython.py

Job Arrays

If you need to have multiple jobs running the same software (such as for a job array). You can't have them all writing to the same overlay file. To resolve this issue, there are two steps:

  • embedding the overlay into the Singularity image
  • and loading the image as temporarily writable.

As a result, the changes you make to the system files (like /miniforge3/) will not persist after you are finished. You should make sure that any changes you want to keep (like important output files) are stored in a /mnt/ directory that's shared with the HPCC, like /mnt/home/$USER or a /mnt/research space.

Embedding the overlay

First, we will embed the overlay into the image file. This links the image and with a copy of the overlay, so that any time the image is used the overlay copy will be brought along without needing to be specified.

To do this, you first need a copy of the image file that you want to use with your overlay. In the case above where you are using the CentOS container that is already on the HPCC, you can create a copy called centos7.sif in your current working directory with

1
singularity build centos7.sif /opt/software/CentOS.container/7.4/bin/centos

Now, you should create an overlay or use the one created in the steps above:

1
overlay_build

Note

After this step, many of the powertools like overlay_exec and overlay_start will not work correctly since they automatically use the /opt/software/CentOS.container/7.4/bin/centos image. In most cases, you can specify arguments to the powertools to use a desired overlay file or image, but this will not work properly with embedded overlays below.

To embed the overlay into the container, you can use the (not very user-friendly) command:

1
singularity sif add --datatype 4 --partfs 2 --parttype 4 --partarch 2 --groupid 1 centos7.sif overlay.img

If you are using a different container or a different overlay, make sure to change the filenames at the end.

Nothing looks any different, but your centos7.sif image will have a copy of the overlay embedded into it. Anything that was installed in the overlay (e.g., a Minicoda installation created using overlay_install_conda) is now available in the image.

Note

The original overlay.img overlay is now entirely disconnected from the one embedded into the centos7.sif image. Any changes made to the overlay will not be reflected in the image. And likewise, any changes to the overlay embedded in the image will not affect the original overlay file.

However, you can always run your image with the original overlay file to ignore the embedded one:

1
singularity shell --overlay overlay.img centos7.sif

Running the embedded overlay

You can now run your image like normal:

1
singularity shell centos7.sif

You will see any changes that were in the original overlay.img. But if you try to make a change to a system file, you will get an error:

1
2
Singularity> mkdir /hello
mkdir: cannot create directory ‘/hello’: Read-only file system

To be able to make changes, you need to start the image with the --writable flag:

1
2
3
4
5
singularity shell --writable centos7.sif
Singularity> mkdir /hello
Singularity> ls -d /hello
/hello
Singularity> exit

If you exit the container and restart it, you will still see the /hello directory:

1
2
3
singularity shell centos7.sif
Singularity> ls -d /hello
/hello

However, this means that you still cannot use this container in multiple jobs at once since they will all try to get full access to the embedded overlay.

Running the embedded overlay in multiple jobs

To fix the problem in the previous section, you need to load the overlay as temporarily read-writable. This loads the filesystem temporarily in a way that multiple jobs can use it at once:

1
2
3
4
5
singularity shell --writable-tmpfs centos7.sif
Singularity> mkdir /hello2
Singularity> ls -d /hello2
/hello2
Singularity> exit

Any changes you make are discarded, so make sure your important files are somewhere accessible on the HPCC like your home or research space.

1
2
3
singularity shell --writable-tmpfs centos7.sif
Singularity> ls -d /hello2
ls: cannot access /hello2: No such file or directory

In a script with a job array, this might look something like

1
2
3
4
5
6
7
8
#!/bin/bash
#SBATCH --walltime=04:00:00
#SBATCH --mem=5gb
#SBATCh --array 1-10
#SBATCH -c 1
#SBATCH -N 1

singularity exec --writable-tmpfs centos7.sif python3 /mnt/home/$USER/mypython.py $SLURM_ARRAY_ID

In this case /mnt/home/$USER/mypython.py should use the $SLURM_ARRAY_ID to do some analysis and write the output to somewhere like /mnt/home/$USER/results so it will be remain after the temporary filesystem is erased.

This overview of singularity was initially written by Dirk Colbry. Please contact the ICER User Support Team if you need any help getting your workflow up and running.

link to ICER User Support Team online contact form