Singularity Overlays

Note

This overview is specific to the High Performance Computing Center (HPCC) at Michigan State University (MSU). For a complete tutorial see the Singularity documentation. This overview assumes that you have an HPCC account and know how to navigate to and use a development node.

This how-to guide will walk you through the first steps in using Singularity. However, if you want to skip the details, try running the following three powertools commands. The first two will create a read/writable overlay and install miniforge. The third one will start this overlay inside a CentOS Singularity image. You will only need to run the first two commands once to build the overlay file (with conda), then you can just use the third command anytime you want to start the overlay:

overlay_build
overlay_install_conda
overlay_start

To exit Singularity just type exit.

Step 1: Get a Singularity image

As a starting point we need a Singularity image, also known as a container or virtual machine. You can think of a Singularity image as a "software hard drive" that contains an entire operating system in a file. There are three main ways to get these images:

Use one of the Singularity images already on the HPCC.
Download an image form one of the many online libraries.
Build your own image.

If you don't know which one of the above to use, we recommend that you pick number 1 and use the Singularity image we already have on the system.

1. Use one of the Singularity images already on the HPCC

For this introduction, we can keep things simple and just use one of the Singularity images already on the HPCC. This image runs CentOS 7 Linux and is a good starting point. Use the following command to start Singularity in a "shell" using the provided image:

singularity shell --env TERM=vt100 /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif

Once you run this command you should see the "Singularity" prompt which will look something like the following:

Singularity>

You did it! You are running a different operating system (OS) than the base operating system installed on the HPCC. All of the main HPCC folders are still accessible from this "container" (ex. /mnt/home, /mnt/research, /mnt/scratch/, etc) so it shouldn't look much different than before (except with a different prompt and you'll no longer have access to some of the base HPCC software).

At this point, if you know what you need, you should be able use files in your home directory and it will compile/run using the Singularity OS instead of the base OS.

Note

While inside the Singularity image, you can still install software in your /mnt/home/$USER and/or /mnt/research folders. The software you install will probably only work from "inside" this Singularity image; however, you will also be able to see and manipulate the files from "outside" the image with your standard HPCC account. This is fine for many researchers but we recommend you jump down to "Step 3: Overlays" to make Singularity even more flexible.

2. Download an image from one of the many online libraries

See the Singularity Introduction for more information.

3. Build your own image

See the Singularity Advanced Topics for more information.

Step 2: Running commands in Singularity

In Step 1 we showed you how to start a Singularity "shell". You can also just "execute" a command inside the Singularity image and return the results. For example, to run

singularity exec /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif <<COMMAND>>

Where you replace <<COMMAND>> with whatever command you need to run. This option will become very helpful when you want to run Singularity inside a submission script; see "Step 4" below.

For example, the df -hT command will report file system disk space usage. So running the df -hT will give a different result running inside or outside a Singularity image. This is because inside the image, df can only "see" the Singularity image storage space. You can test this using the following commands:

df -hT

singularity exec /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif df -hT

Step 3: Overlays

One problem we often encounter on the HPCC is "lots-of-small-files" (hundreds of files where each one is < 50MB). The filesystem is optimized for large files. Lots of small files end up "clogging" the filesystem which can slow the system down for everyone. One useful trick of Singularity is you can make a single large file called an "Overlay" which can be attached to a Singularity session. You can use an Overlay as a "filesystem inside a single file." This lets you store lots of small individual files inside a single overlay file. From the user point of view you can have as many small files as you want accessible from the Singularity image (within reasonable limits). However, these small files act as a single file from the HPCC's point of view which is easier on the filesystem.

This technique is really helpful if you are using complex software installs such as lots of Python, R or Conda installs. It can also be helpful if your research data is lots of small files.

Make your overlay file

Making an overlay is not hard but takes multiple steps. For details on how to make an overlay we recommend viewing the Singularity overlay documentation.

Fortunately the HPCC has a Powertool that can make a basic overlay for you. All you need to do is run the following command:

overlay_build

This will create an empty overlay without an associated Singularity image.

This overlay can be applied to a Singularity image using the --overlay option as follows:

singularity shell --overlay overlay.img --env TERM=vt100 /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif

If you have an overlay called overlay.img in your current working directory you can use the following powertool shortcut to run it inside the CentOS image:

overlay_start

or, if you have a custom overlay name:

overlay_start your_overlay_file_name.img

You can also view the amount of filespace available on an overlay (using the df -hT command we used above) by using the following powertool:

overlay_size

Writing to your overlay

Once you are in the Singularity shell you can now write to the overlay as if you were adding files to the "root" directory (/). This is not common practice so double check any software you are using! For example, running the following commands from inside your Singularity image should allow you to install miniforge:

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"

bash Miniforge3-$(uname)-$(uname -m).sh

Since we install miniforge a lot there is yet another powertool that will do this installation for you. Just run the following command outside the Singularity image:

overlay_install_conda

Once miniforge is installed in the /miniforge3/ directory you need to add the folder /miniforge3/ to the path with the following command inside the Singularity image:

export PATH=/miniforge3/bin:$PATH

conda --init

Or, use the powertool from before and it will automatically add /miniforge3/ to your path from outside the Singularity image:

overlay_start

At this point you can use pip and conda installs within the Singularity image as much as you like. All of your installed files will be stored within the overlay. These generate hundreds of small files but it doesn't matter because everything will be stored in the overlay.img file as one big file.

To exit Singularity, type exit from within the container. To start your overlay image again, type overlay_start from the HPCC.

Step 4: Submitting Jobs

Once we have our image and our Conda overlay working on a development node, we can execute a script inside of the Singularity image "in batch mode" using the exec from above. For example, the command below uses the overlay in which we installed miniforge in order to run a Python script called mypython.py. This mypython.py file is stored in our home directory on the HPCC - it does not have to be part of the overlay for us to execute it using the Python environment install inside the overlay.

singularity exec --overlay overlay.img /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif python3 /mnt/home/$USER/mypython.py

Once the above has been tested on a development node we can submit this as a job to the HPCC using the following submission script as an example. Remember to tweak your resource requests as necessary for your work.

#!/bin/bash --login
#SBATCH --walltime=04:00:00
#SBATCH --mem=5gb
#SBATCH -c 1
#SBATCH -N 1

singularity exec --overlay overlay.img --env PATH=/miniconda3/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/sysbin/ /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif python3 /mnt/home/$USER/mypython.py

Again, we have a powertool to help clean this up for common workflows. Using the overlay_exec command you can simplify the above submission script using the following:

#!/bin/bash --login
#SBATCH --walltime=04:00:00
#SBATCH --mem=5gb
#SBATCH -c 1
#SBATCH -N 1

overlay_exec python3 /mnt/home/$USER/mypython.py

Job Arrays

If you need to have multiple jobs running the same software (such as for a job array) they will not be able to all write to the same overlay file. To resolve this issue, there are two steps:

embedding a copy of the overlay into the Singularity image
and loading the Singularity image itself as temporarily writable.

As a result, the changes you make to the system files (like /miniforge3/) will not persist after you are finished. You should make sure that any changes you want to keep (like important output files) are stored in a /mnt/ directory that's shared with the HPCC, like /mnt/home/$USER or a /mnt/research space.

Embedding the overlay

First, we will embed the overlay into the image file. This links the image with a copy of the overlay so that any time the image is used the overlay copy will be brought along automatically without needing to be specified.

To do this, you first need a copy of the image file that you want to use with your overlay. In the case above where you are using the CentOS container that is already on the HPCC, you can create a copy called centos7.sif in your current working directory with

singularity build centos7.sif /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif

Now, you should use the overlay created in the steps above or create a new overlay with

overlay_build

Warning

After this step, many of the powertools like overlay_exec and overlay_start will not work correctly since they automatically use the /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif image. In most cases, you can specify arguments to the powertools to use a desired overlay file or image, but this will not work properly with embedded overlays below.

To embed the overlay into the container, you can use the (not very user-friendly) command:

singularity sif add --datatype 4 --partfs 2 --parttype 4 --partarch 2 --groupid 1 centos7.sif overlay.img

If you are using a different container or a different overlay, make sure to change the filenames at the end.

Nothing looks any different, but your centos7.sif image will have a copy of the overlay embedded into it. Anything that was installed in the overlay (e.g., a Minicoda installation created using overlay_install_conda) is now available in the image.

Note

The original overlay.img overlay is now entirely disconnected from the one embedded into the centos7.sif image. Any changes made to overlay.img will not be reflected in the cdentos7.sif. Likewise, any changes to the overlay embedded in the image centos7.sif will not affect the original overlay.img file. However, you can always run your image with the original overlay file to ignore the embedded one:

singularity shell --overlay overlay.img centos7.sif

Running the embedded overlay

You can now run your image like normal:

singularity shell centos7.sif

You will see any existing files that were in the original overlay.img. But if you try to make a change to a system file, you will get an error:

singularity> mkdir /hello
mkdir: cannot create directory ‘/hello’: Read-only file system

To be able to make changes, you need to start the image with the --writable flag:

singularity shell --writable centos7.sif
singularity> mkdir /hello
singularity> ls -d /hello
/hello
singularity> exit

If you exit the container and restart it, you will still see the /hello directory:

singularity shell centos7.sif
singularity> ls -d /hello
/hello

However, this means that you still cannot use this container in multiple jobs at once since they will all try to get full access to the embedded overlay at the same time.

Running the embedded overlay in multiple jobs

To fix the problem in the previous section, you need to load the overlay as temporarily read-writable. This loads the filesystem temporarily in a way that multiple jobs can use it at once:

singularity shell --writable-tmpfs centos7.sif
singularity> mkdir /hello2
singularity> ls -d /hello2
/hello2
singularity> exit

Warning

Any changes you make are discarded, so make sure your important files are somewhere accessible on the HPCC like your home or research space.

singularity shell --writable-tmpfs centos7.sif
singularity> ls -d /hello2
ls: cannot access /hello2: No such file or directory

In a script with a job array, this might look something like

#!/bin/bash --login
#SBATCH --walltime=04:00:00
#SBATCH --mem=5gb
#SBATCh --array 1-10
#SBATCH -c 1
#SBATCH -N 1

singularity exec --writable-tmpfs centos7.sif python3 /mnt/home/$USER/mypython.py $SLURM_ARRAY_ID

In this case /mnt/home/$USER/mypython.py should use the $SLURM_ARRAY_ID to do some analysis and write the output to a location on the HPCC filesystem like /mnt/home/$USER/results so it will be remain after Singularity's temporary filesystem is erased.