Overview of job scheduling and management
HPCC uses SLURM (Simple Linux Utility for Resource Management) to manage users' jobs and computing resources. SLURM is an open-source, fault-tolerant, and highly scalable scheduling system. It has been employed by a large number of national and international computing centers. Users can submit a job by SLURM commands and request computing resources with specifications in a job script or on a command line.
SLURM uses command-line commands to control jobs and clusters as well as show detailed information about jobs. The table below presents the most frequently used commands on HPCC. A complete list can be found at the SLURM documentation page.
|sacct||displays accounting data for all jobs and job steps in the SLURM job accounting log or SLURM database.|
|sbatch||Used to submit batch jobis to SLURM job queue|
|sacctmgr||Used to view and modify SLURM account information|
|scancel||Used to signal jobs or job steps that are under the control of SLURM.|
|scontrol||Used view and modify SLURM configuration and state.|
|sinfo||view information about SLURM nodes and partitions.|
|smap||graphically view information about SLURM jobs, partitions, and set configurations parameters.|
|sprio||view the factors that comprise a job's scheduling priority.|
|squeue||view information about jobs located in the SLURM scheduling queue.|
|srun||Run parallel jobs.|