This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.
Singularity Overlays (2022-09-24)
Singularity is a versatile tool to give researchers more flexibility installing software and running their workflows on the HPC. Most workflows don't need Singularity but it can be extremely helpful to solve certain weridly difficult problems. Some common examples for researchers using singularity on the HPC include:
- Installing software that needs a special/different base operating system.
- Installing software that requires administrator privileges (aka root, su and/or sudo).
- Installing complex dependency trees (like python and R)
- Using existing software inside a pre-built virtual machine.
- Working with lots of tiny files on the HPC filesystems which are better designed for smaller numbers of big files.
- Building workflows that can easily move between different resources.
NOTE This overview is specific to the High Performance Computing Center (HPCC) at Michigan State University (MSU). For a complete tutorial see the Singularity documentation. This overview assumes that you have an HPCC account and know how to navigate to and use a development node.
The remainder of this overview will walk you through the first steps in using Singularity. However, if you want to skip the details just try running the following three powertools commands. The first two will create a read/writable overlay and install miniconda. The third one will start this overlay inside a CentOS singularity image. You will only need to run the first two commands once to build the overlay file (with conda), then you can just use the third command anytime you want to start the overlay:
1 2 3
To exit singularity just type
Step 1: Get a singularity image
As a starting point we need a singularity image, also known as a container or virtual machine. You can think of a singularity image as a "software hard drive" that contains an entire operating system in a file. There are three main ways to get these images:
- Use one of the Singularity images already on the HPCC.
- Download an image form one of the many online libraries.
- Build your own image.
If you don't know which one of the above to use, I recommend that you pick number 1 and just use the singularity image we already have on the system.
1. Use one of the Singularity images already on the HPCC
For this introduction, we can keep things simple and just use one of the Singularity images already on the HPCC. This image runs CentOS 7 Linux and is a good starting point. Use the following command to start singularity in a "shell" using the provided image:
Once you run this command you should see the "Singularity" prompt which will look something like the following:
You did it! You are running a different operating system (OS) than the base operating system. All of the main HPCC folders are still accessible from this "container" (ex. /mnt/home, /mnt/research, /mnt/scratch/, etc) so it shouldn't look much different than before (except for the different prompt and you no longer have access to some of the base HPCC software).
At this point, if you know what you need, you should be able use files in your home directory and it will compile/run using the singularity OS instead of the base OS.
NOTE: You can just install software in your
/mnt/research folders. The software you install will probably only work from "inside" this singularity image. However, you will also be able to see and manipulate the files from within your standard HPC account. This is fine for many researchers but I recommend you jump down to "Step 3: Overlays" to make Singularity even more flexible.
2. Download an image form one of the many online libraries
Many people publish singularity images and post them on public "libraries" for easy install. Here is a list of online libraries you can browse (this section of the tutorial may need more work, not all of these may work on the HPCC):
Link to Browse Sylabs
1 2 3
Link to Browse Docker Hub
1 2 3
Singularity Hub (aka shub)
Link to Browse Singularity Hub
1 2 3
3. Build your own image
This one is more complex and outside the scope of the overview. However, if you are interested I recommend you try using the build command with a Docker image since it is fairly easy to install on your personal computer. Here is a link to how to use docker to make a singularity image:
Step 2: Running commands in Singularity
In Step 1 we showed you how to start a singularity "shell". You can also just "execute" a command inside the singularity image and return the results. For example, to run
Where you replace
<<COMMAND>> with whatever command you need to run. This option will become very helpful when you want to run singularity inside a submission script "See Step 4" below.
For example, the
df -hT command will report file system disk space usage. So running the
df -hT will give a different result running inside or outside a singularity image. You can test this using the following commands:
1 2 3
Step 3: Overlays
One problem we often encounter on the HPCC is "lots-of-small-files" (hundreds of files where each one is < 50MB). The filesystem is optimized for large files. Lots of small files end up "clogging" things up which can slow things down for everyone. One useful trick of singularity is you can make a single large file called an "Overlay" which can be attached to a singularity session. You can use an Overlay as a "filesystem inside a single file" where you can store lots of the small files inside the single overlay file. From the user point of view you can have as many small files as you want accessible from the singularity image (within reasonable limits). However, these small files act as a single file from the HPCC point of view and doesn't clog things up.
This technique is really helpful if you are using complex software installs such as lots of Python, R or Conda installs. It can also be helpful if your research data is lots of small files.
Make your overlay file
Making an overlay is not hard but takes multiple steps. For details on how to make an overlay we recommend viewing the singularity overlay documentation.
Fortunately the HPCC has a "powertool" that can make a basic overlay for you. All you need to do is run the following command:
This overlay can be applied to a singularity image using the
--overlay option as follows:
If you have an overlay called
overlay.img in your current directory you can use the following powertool shortcut to run it inside the CentOS image:
You can also view the amount of filespace available on an overlay (using the
df -hT command we used above) by using the following powertool:
Writing to your overlay
Once you are in the singularity shell you can now write to the overlay as if you were adding files to the "root" directory (/). For example, running the following commands from inside of your singularity image should allow you to install miniconda3:
1 2 3
Since we install miniconda3 a lot there is yet another powertool that will do this installation for you. Just run the following command:
Once miniconda is installed in the
/miniconda3/ directory you need to add the folder
/miniconda3/ to the path with the following command:
1 2 3
Or, just use the powertool from before and it will automatically add
/miniconda3 to your path:
At this point you can use
conda installs as much as you like. These generate hundreds of small files but it doesn't matter because everything will be stored in the overlay.img file as one big file.
To exit singularity just type
exit. To start your overlay image just type
Step 4: Submitting Jobs
Once we have our image and our conda overlay working in a development node we can execute a script inside of the singularity image, "in batch mode" using the
exec from above. For example, this script uses our miniconda installed overlay and runs the python3 script called "mypython.py" which is stored in my home directory on the HPCC.
Once the above is running on a development node we can just submit this as a job to the HPCC using the following submissions script:
1 2 3 4 5 6 7
Again, we have a powertool to help clean this up for common workflows. Using the
overlay_exec command you can simply the above submission script using the following:
1 2 3 4 5 6 7
If you need to have multiple jobs running the same software (such as for a job array). You can't have them all writing to the same overlay file. To resolve this issue, there are two steps:
- embedding the overlay into the Singularity image
- and loading the image as temporarily writable.
As a result, the changes you make to the system files (like
/miniconda3/) will not persist after you are finished. You should make sure that any changes you want to keep (like important output files) are stored in a
/mnt/ directory that's shared with the HPCC, like
/mnt/home/$USER or a
Embedding the overlay
First, we will embed the overlay into the image file. This links the image and with a copy of the overlay, so that any time the image is used the overlay copy will be brought along without needing to be specified.
To do this, you first need a copy of the image file that you want to use with your overlay. In the case above where you are using the CentOS container that is already on the HPCC, you can create a copy called
centos7.sif in your current working directory with
Now, you should create an overlay or use the one created in the steps above:
After this step, many of the powertools like
overlay_start will not work correctly since they automatically use the
/opt/software/CentOS.container/7.4/bin/centos image. In most cases, you can specify arguments to the powertools to use a desired overlay file or image, but this will not work properly with embedded overlays below.
To embed the overlay into the container, you can use the (not very user-friendly) command:
If you are using a different container or a different overlay, make sure to change the filenames at the end.
Nothing looks any different, but your
centos7.sif image will have a copy of the overlay embedded into it. Anything that was installed in the overlay (e.g., a Minicoda installation created using
overlay_install_conda) is now available in the image.
overlay.img overlay is now entirely disconnected from the one embedded into the
centos7.sif image. Any changes made to the overlay will not be reflected in the image. And likewise, any changes to the overlay embedded in the image will not affect the original overlay file.
However, you can always run your image with the original overlay file to ignore the embedded one:
Running the embedded overlay
You can now run your image like normal:
You will see any changes that were in the original
overlay.img. But if you try to make a change to a system file, you will get an error:
To be able to make changes, you need to start the image with the
1 2 3 4 5
If you exit the container and restart it, you will still see the
1 2 3
However, this means that you still cannot use this container in multiple jobs at once since they will all try to get full access to the embedded overlay.
Running the embedded overlay in multiple jobs
To fix the problem in the previous section, you need to load the overlay as temporarily read-writable. This loads the filesystem temporarily in a way that multiple jobs can use it at once:
1 2 3 4 5
Any changes you make are discarded, so make sure your important files are somewhere accessible on the HPCC like your home or research space.
1 2 3
In a script with a job array, this might look something like
1 2 3 4 5 6 7 8
In this case
/mnt/home/$USER/mypython.py should use the
$SLURM_ARRAY_ID to do some analysis and write the output to somewhere like
/mnt/home/$USER/results so it will be remain after the temporary filesystem is erased.
This overview of singularity was initially written by Dirk Colbry. Please contact the ICER User Support Team if you need any help getting your workflow up and running.