Skip to content

How Jobs are Scheduled

SLURM schedules jobs in two ways: the main scheduler and the backfill scheduler. The main scheduler constantly tries to start high priority jobs. The backfill scheduler considers all jobs, and starts any jobs that won't defer the start time of a higher priority job.

Scheduler Function When it Runs Run Time
Main Launches high priority jobs that can start immediately. Stops evaluating jobs once it encounters a job that cannot be started.  About every 2 seconds 0.08-2 seconds
Backfill Evaluates the entire queue. Launches jobs that won't interfere with the start time of a higher priority job. Sets jobs' StartTime and SchedNodeList. 20 seconds after the last backfill cycle completes 2-15+ minutes

StartTime and SchedNodeList

The backfill scheduler sets the StartTime and SchedNodeList parameters on jobs that can start within the next 7 days. These parameters can be viewed in the output of scontrol show job <jobid>. StartTime estimates when a job will start and SchedNodeList shows the nodes this job might start on. StartTime is only an estimate. These values are updated every time the backfill scheduler runs and may change as running jobs complete and new jobs are submitted.

Minimum Job Requirements to Avoid Deferment

Jobs must meet certain criteria before the backfill scheduler will avoid potentially deferring them through starting lower priority jobs. These thresholds allow the backfill scheduler to cycle faster and maintain high system utilization.

Criteria Minimum Description
Priority 3000 Jobs require a minimum priority of 3000 is require to avoid potential deferment in scheduling. Buy-in account jobs are never below this threshold.
Age 30 minutes Jobs must be queued for at least 30 minutes to avoid potential deferment in scheduling. This applies to all jobs.

Job Priority Factors

A jobs priority is determined by a combination of several priority factors. Age, size, fairshare, and whether it was submitted to a buy-in account all contribute to the job’s priority.

Priority Factor Description Maximum Contribution to Priority
Age Starts at zero at job submission, then increases linearly to a maximum of 60000 after 30 days 60000 after 30 days
Fairshare Starts at 60000 and decreases and users' recent usage goes up. Usage for this calculation is decayed 50% each day 60000 for no recent cluster usage
Size Scales linearly with the amount of CPU and memory requested by a job. 100 per CPU, 20 per GB. 52000+ depending on memory requested
QOS Adds 3000 to buy-in jobs to ensure they are always above backfill schedulers minimum priority for reserving resources 3000