Singularity Overlays
Note
This overview is specific to the High Performance Computing Center (HPCC) at Michigan State University (MSU). For a complete tutorial see the Singularity documentation. This overview assumes that you have an HPCC account and know how to navigate to and use a development node.
This how-to guide will walk you through the first steps in using Singularity. However, if you want to skip the details, try running the following three powertools commands. The first two will create a read/writable overlay and install miniforge. The third one will start this overlay inside a CentOS Singularity image. You will only need to run the first two commands once to build the overlay file (with conda), then you can just use the third command anytime you want to start the overlay:
1 2 3 |
|
To exit Singularity just type exit
.
Step 1: Get a Singularity image
As a starting point we need a Singularity image, also known as a container or virtual machine. You can think of a Singularity image as a "software hard drive" that contains an entire operating system in a file. There are three main ways to get these images:
- Use one of the Singularity images already on the HPCC.
- Download an image form one of the many online libraries.
- Build your own image.
If you don't know which one of the above to use, we recommend that you pick number 1 and use the Singularity image we already have on the system.
1. Use one of the Singularity images already on the HPCC
For this introduction, we can keep things simple and just use one of the Singularity images already on the HPCC. This image runs CentOS 7 Linux and is a good starting point. Use the following command to start Singularity in a "shell" using the provided image:
1 |
|
Once you run this command you should see the "Singularity" prompt which will look something like the following:
1 |
|
You did it! You are running a different operating system (OS) than the base operating system installed on the HPCC. All of the main HPCC folders are still accessible from this "container" (ex. /mnt/home
, /mnt/research
, /mnt/scratch/
, etc) so it shouldn't look much different than before (except with a different prompt and you'll no longer have access to some of the base HPCC software).
At this point, if you know what you need, you should be able use files in your home directory and it will compile/run using the Singularity OS instead of the base OS.
Note
While inside the Singularity image, you can still install software in your /mnt/home/$USER
and/or /mnt/research
folders. The software you install will probably only work from "inside" this Singularity image; however, you will also be able to see and manipulate the files from "outside" the image with your standard HPCC account. This is fine for many researchers but we recommend you jump down to "Step 3: Overlays" to make Singularity even more flexible.
2. Download an image from one of the many online libraries
See the Singularity Introduction for more information.
3. Build your own image
See the Singularity Advanced Topics for more information.
Step 2: Running commands in Singularity
In Step 1 we showed you how to start a Singularity "shell". You can also just "execute" a command inside the Singularity image and return the results. For example, to run
1 |
|
Where you replace <<COMMAND>>
with whatever command you need to run. This option will become very helpful when you want to run Singularity inside a submission script; see "Step 4" below.
For example, the df -hT
command will report file system disk space usage. So running the df -hT
will give a different result running inside or outside a Singularity image. This is because inside the image, df
can only "see" the Singularity image storage space. You can test this using the following commands:
1 2 3 |
|
Step 3: Overlays
One problem we often encounter on the HPCC is "lots-of-small-files" (hundreds of files where each one is < 50MB). The filesystem is optimized for large files. Lots of small files end up "clogging" the filesystem which can slow the system down for everyone. One useful trick of Singularity is you can make a single large file called an "Overlay" which can be attached to a Singularity session. You can use an Overlay as a "filesystem inside a single file." This lets you store lots of small individual files inside a single overlay file. From the user point of view you can have as many small files as you want accessible from the Singularity image (within reasonable limits). However, these small files act as a single file from the HPCC's point of view which is easier on the filesystem.
This technique is really helpful if you are using complex software installs such as lots of Python, R or Conda installs. It can also be helpful if your research data is lots of small files.
Make your overlay file
Making an overlay is not hard but takes multiple steps. For details on how to make an overlay we recommend viewing the Singularity overlay documentation.
Fortunately the HPCC has a Powertool that can make a basic overlay for you. All you need to do is run the following command:
1 |
|
This will create an empty overlay without an associated Singularity image.
This overlay can be applied to a Singularity image using the --overlay
option as follows:
1 |
|
If you have an overlay called overlay.img
in your current working directory you can use the following powertool shortcut to run it inside the CentOS image:
1 |
|
or, if you have a custom overlay name:
1 |
|
You can also view the amount of filespace available on an overlay (using the df -hT
command we used above) by using the following powertool:
1 |
|
Writing to your overlay
Once you are in the Singularity shell you can now write to the overlay as if you were adding files to the "root" directory (/). This is not common practice so double check any software you are using! For example, running the following commands from inside your Singularity image should allow you to install miniforge:
1 2 3 |
|
Since we install miniforge a lot there is yet another powertool that will do this installation for you. Just run the following command outside the Singularity image:
1 |
|
Once miniforge is installed in the /miniforge3/
directory you need to add the folder /miniforge3/
to the path with the following command inside the Singularity image:
1 2 3 |
|
Or, use the powertool from before and it will automatically add /miniforge3/
to your path from outside the Singularity image:
1 |
|
At this point you can use pip
and conda
installs within the Singularity image as much as you like. All of your installed files will be stored within the overlay. These generate hundreds of small files but it doesn't matter because everything will be stored in the overlay.img file as one big file.
To exit Singularity, type exit
from within the container. To start your overlay image again, type overlay_start
from the HPCC.
Step 4: Submitting Jobs
Once we have our image and our Conda overlay working on a development node, we can execute a script inside of the Singularity image "in batch mode" using the exec
from above. For example, the command below uses the overlay in which we installed miniforge in order to run a Python script called mypython.py
. This mypython.py
file is stored in our home directory on the HPCC - it does not have to be part of the overlay for us to execute it using the Python environment install inside the overlay.
1 |
|
Once the above has been tested on a development node we can submit this as a job to the HPCC using the following submission script as an example. Remember to tweak your resource requests as necessary for your work.
1 2 3 4 5 6 7 |
|
Again, we have a powertool to help clean this up for common workflows. Using the overlay_exec
command you can simplify the above submission script using the following:
1 2 3 4 5 6 7 |
|
Job Arrays
If you need to have multiple jobs running the same software (such as for a job array) they will not be able to all write to the same overlay file. To resolve this issue, there are two steps:
- embedding a copy of the overlay into the Singularity image
- and loading the Singularity image itself as temporarily writable.
As a result, the changes you make to the system files (like /miniforge3/
) will not persist after you are finished. You should make sure that any changes you want to keep (like important output files) are stored in a /mnt/
directory that's shared with the HPCC, like /mnt/home/$USER
or a /mnt/research
space.
Embedding the overlay
First, we will embed the overlay into the image file. This links the image with a copy of the overlay so that any time the image is used the overlay copy will be brought along automatically without needing to be specified.
To do this, you first need a copy of the image file that you want to use with your overlay. In the case above where you are using the CentOS container that is already on the HPCC, you can create a copy called centos7.sif
in your current working directory with
1 |
|
Now, you should use the overlay created in the steps above or create a new overlay with
1 |
|
Warning
After this step, many of the powertools like overlay_exec
and overlay_start
will not work correctly since they automatically use the /mnt/research/common-data/Container_images/icer-centos7-img_latest.sif
image. In most cases, you can specify arguments to the powertools to use a desired overlay file or image, but this will not work properly with embedded overlays below.
To embed the overlay into the container, you can use the (not very user-friendly) command:
1 |
|
If you are using a different container or a different overlay, make sure to change the filenames at the end.
Nothing looks any different, but your centos7.sif
image will have a copy of the overlay embedded into it. Anything that was installed in the overlay (e.g., a Minicoda installation created using overlay_install_conda
) is now available in the image.
Note
The original overlay.img
overlay is now entirely disconnected from the one embedded into the centos7.sif
image. Any changes made to overlay.img
will not be reflected in the cdentos7.sif
. Likewise, any changes to the overlay embedded in the image centos7.sif
will not affect the original overlay.img
file.
However, you can always run your image with the original overlay file to ignore the embedded one:
1 |
|
Running the embedded overlay
You can now run your image like normal:
1 |
|
You will see any existing files that were in the original overlay.img
. But if you try to make a change to a system file, you will get an error:
1 2 |
|
To be able to make changes, you need to start the image with the --writable
flag:
1 2 3 4 5 |
|
If you exit the container and restart it, you will still see the /hello
directory:
1 2 3 |
|
However, this means that you still cannot use this container in multiple jobs at once since they will all try to get full access to the embedded overlay at the same time.
Running the embedded overlay in multiple jobs
To fix the problem in the previous section, you need to load the overlay as temporarily read-writable. This loads the filesystem temporarily in a way that multiple jobs can use it at once:
1 2 3 4 5 |
|
Warning
Any changes you make are discarded, so make sure your important files are somewhere accessible on the HPCC like your home or research space.
1 2 3 |
|
In a script with a job array, this might look something like
1 2 3 4 5 6 7 8 |
|
In this case /mnt/home/$USER/mypython.py
should use the $SLURM_ARRAY_ID
to do some analysis and write the output to a location on the HPCC filesystem like /mnt/home/$USER/results
so it will be remain after Singularity's temporary filesystem is erased.