Warning
This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.
Change to modules in SLURM jobs
Summary: The way that modules are loaded in SLURM jobs is changing slightly. SLURM jobs will now require you to load modules before you use them in your SLURM script. Previously, this was only a best practice recommended by ICER.
What do I need to do?:
graph TD
A[Start] --> B[Do you only use the default modules?];
B -->|Yes| C([<b>You have to do nothing!</b>]);
B -->|No| D[Where do you load modules for your job?];
D -->|In the batch script| E([<b>You have to do nothing!</b>]);
D -->|On the development node| F([<b>Add <code>module load</code> lines to your SLURM script.</b>]);
If you use a workflow manager like Nextflow or Snakemake and are using non-default modules, please see the recommendations below.
Why is this happening?
With the new module system, ICER is able to build software adapted to the specific types of nodes in the HPCC. For example, our intel18
nodes have capabilities like AVX-512 that are not available in the amd20
nodes.
Previously, ICER would build one version of the software that works on every type of node. Now we have the capability to build multiple versions of software each adapted to the unique capabilities of our hardware generations. However, this means that when you load one of these "node-adapted" modules, that same module gets used in the SLURM job no matter where in the HPCC it runs. This leads to "illegal instruction" errors when a software built for newer capabilities runs on a node without those capabilities.
By making this change, SLURM will load modules from the collection adapted to the node the job is running on, no matter what development node was used to submit that job. This means that your code will run as quickly and efficiently as possible on the nodes that SLURM assigns it.
While there are other solutions (like constraining your job to the same type of node that you are submitting from) this solution is most flexible, is in line with our previous recommendations, and allows you access to the largest collection of nodes at once, reducing queue times.
What exactly is being changed?
ICER is changing the way that the module system and SLURM interact. The way it is now, SLURM inherits the entire environment of the development node you submit your job from including all loaded modules and all changes to the module path (the location where modules are found).
In the new configuration, SLURM jobs will start by resetting all loaded modules inherited from the development node back to the appropriate defaults for your assigned compute node and changing the module path accordingly (see above)
What do I need to do?
If you already load all modules in your SLURM scripts before you use them (as is recommended by ICER), you don't need to make any changes!
However, if you load non-default modules on the development nodes and then use those modules in your SLURM scripts, please add those module load commands to your script before you use the programs in those modules. Additionally, if you make any changes to your module path using the module use
command (e.g., you are loading modules that are not provided by ICER), make sure you do this before you load modules in your SLURM scripts as well.
Example
Suppose that in a typical session, I have a SLURM script that looks like
script.sb | |
---|---|
1 2 3 4 5 6 7 |
|
The stata
command comes from the Stata
module. Before I submit this script, I login and load modules like:
1 2 3 4 5 |
|
This will no longer work because when the SLURM job starts, the Stata
module will be unloaded and replaced by all default modules (which do not include Stata
). The fix is to add the module purge
and module load Stata/18-MP
lines to the beginning of the SLURM script like:
script_fixed.sb | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|
You no longer have to load the modules before submitting the job.
Special considerations for Nextflow and Snakemake
Nextflow and Snakemake are two workflow managers that can submit jobs to SLURM for you. Since they build the SLURM scripts, you will need to take extra measures to ensure that they load the required modules in the steps where they are used.
Nextflow
In Nextflow, add the modules you need to the process definition using the module
directive. For examples and more information, please see Nextflow's documentation.
Snakemake
In Snakemake, add the modules you need to the rule using the envmodules
key and run Snakemake with the --use-envmodules
flag. For examples and more information, please see Snakemake's documentation.