Writing and submitting job scripts

The HPCC uses the SLURM system to manage computing resources. Users access these resources by submitting batch jobs.

This tutorial will walk you through the process of writing and submitting a job submission script for a parallel job that uses multiple cores across several nodes.

Setup

First, make sure you're on a development node and in your home directory.

Clone and compile the example we're going to use:

getexample MPI_OpenMP_GPU
cd MPI_OpenMP_GPU

make

This directory contains example C++ codes using several forms of parallelism. These examples may be useful if you find yourself developing your own software, and interested users should read the accompanying README.

For now we'll just use the hybrid example. This example combines MPI and OpenMP. MPI allows multiple processes to communicate with each other, while OpenMP allows multiple CPUs to "collaborate" on the same process.

We would like to run 4 processes, each on their own node, with 2 CPUs per process. That means we'll need a total of 8 CPUs.

Writing a job script

A job script is a plain text file. It's composed of two main parts:

The resource request
The commands for running the job

Using nano or your preferred text editor, create and open hybrid.sb:

nano hybrid.sb

Resource request

The hybrid example likely uses a more complex set of resource requests than you will need for your own jobs, but it useful for illustrative purposes.

Recall from the previous section that we'd like to run hybrid over 4 processes with 2 CPUs per process. Each process will also run on its own node. This outlines the resources we want to request.

Let's type up the first part of the job script, the resource request.

The very first line specifies the interpreter we want to use for our commands; in this case, it's the bash shell.

Then, each resource request line begins with #SBATCH. All resources must be requested at the top of the file, before any commands, or they will be ignored.

The request lines are as follows:

Wall clock limit - how long will the job run? This job will run for 10 minutes.
The number of nodes; here, 4
The number of tasks, also known as processes, running on each node. Here we want 1.
The number of CPUs per task. The default is one, but we've requested 2.
The amount of memory to use per CPU. We are requesting 1 GB each.
The name of the job, so we can easily identify it later.

#!/bin/bash --login

#SBATCH --time=00:10:00
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1G
#SBATCH --job-name hybrid_example

Job commands

The resource request is just one part of writing a job script. The second part is running the job itself.

To run our job we need to:

Load required modules
Change to the appropriate directory
Run hybrid

We'll add the following lines to hybrid.sb:

module reset

cd ${HOME}/MPI_OpenMP_GPU

srun -c 2 hybrid

Notice that we use the srun command to run our hybrid executable. This command prepares the parallel runtime environment, setting up the requested 4 processes across 4 nodes and their associated CPUs.

You may already be familiar with mpirun or mpiexec. While srun is similar to these commands it is preferred for use on the HPCC because of its connection to the SLURM scheduler.

We can also add a couple of optional commands that will save data about our job:

### write job information to SLURM output file.
scontrol show job $SLURM_JOB_ID

### write resource usage to SLURM output file (uses a powertools command).
module load powertools
js -j $SLURM_JOB_ID

You have now completed your job script.

If you used nano to write it, hit Ctrl+X followed by Y to save, then press Enter to accept the filename.

Final notes

As was previously said, your job is most likely going to use much simpler resource specifications than shown above. You can see our example job scripts for more ideas.

By default, SLURM will try to use the settings if overriding commands aren't specified:

--nodes=1
--tasks-per-node=1
--cpus-per-task=1
--time=00:01:00
--mem-per-cpu=750M

See the curated List of Job Specifications or the sbatch Documentation for more options.

Batch job submission

Now that we have our job script, we need to submit it to the SLURM scheduler. For this, we use the sbatch command:

$ sbatch hybrid.sb
Submitted batch job 8929

If the command has been submitted successfully, the job controller will issue a job ID on the screen. This ID can be used with, for example, scancel to cancel the job or sacct to look up stats about the job after it ends.

Note the sbatch command only runs on development and compute nodes - it will not work on any gateway node.

Checking our job status

Once the job has been submitted, we can see it in the queue with sq powertool:

module load powertools
sq

This will show us the following information:

The job's ID number
The job's name, which we specified in the script
The job's submitting user (should be your username)
The job's state (pending, running, or completed)
The job's current walltime
The job's allowed walltime
The number of nodes requested and/or allocated to the job
The reason why the job has the status it has

Viewing job outputs

Every SLURM job creates a file that contains the standard output and standard error from the job.

The default name is slurm-<jobid>.out where <jobid> is the job ID assigned when the job was submitted.

Find the output log from your job and view it with less <filename>. You should see several lines printing the thread and process information for each CPU involved.

The SLURM log files are essential for investigating whether or not your job ran successfully and for finding out why it failed.