How Jobs are Scheduled
Schedulers
SLURM schedules jobs in two ways: the main scheduler and the backfill scheduler. The main scheduler constantly tries to start high priority jobs. The backfill scheduler considers all jobs, and starts any jobs that won't defer the start time of a higher priority job.
Scheduler | Function | When it Runs | Run Time |
---|---|---|---|
Main | Launches high priority jobs that can start immediately. Stops evaluating jobs once it encounters a job that cannot be started. | About every 2 seconds | 0.08-2 seconds |
Backfill | Evaluates the entire queue. Launches jobs that won't interfere with the start time of a higher priority job. Sets jobs' StartTime and SchedNodeList. | 20 seconds after the last backfill cycle completes | 2-15+ minutes |
StartTime and SchedNodeList
The backfill scheduler sets the StartTime and SchedNodeList parameters
on jobs that can start within the next 7 days. These parameters can be
viewed in the output of scontrol show job <jobid>
. StartTime
estimates when a job will start and SchedNodeList shows the nodes this
job might start on. StartTime is only an estimate. These values are
updated every time the backfill scheduler runs and may change as running
jobs complete and new jobs are submitted.
Minimum Job Requirements to Avoid Deferment
Jobs must meet certain criteria before the backfill scheduler will avoid potentially deferring them through starting lower priority jobs. These thresholds allow the backfill scheduler to cycle faster and maintain high system utilization.
Criteria | Minimum | Description |
---|---|---|
Priority | 3000 | Jobs require a minimum priority of 3000 is require to avoid potential deferment in scheduling. Buy-in account jobs are never below this threshold. |
Age | 30 minutes | Jobs must be queued for at least 30 minutes to avoid potential deferment in scheduling. This applies to all jobs. |
Job Priority Factors
A job's priority is determined by a combination of several priority factors. Age, size, fairshare, and whether it was submitted to a buy-in account all contribute to the job’s priority.
Priority Factor | Description | Maximum Contribution to Priority |
---|---|---|
Age | Starts at zero at job submission, then increases linearly to a maximum of 60000 after 30 days | 60000 after 30 days |
Fairshare | Starts at 60000 and decreases and users' recent usage goes up. Usage for this calculation is decayed 50% each day | 60000 for no recent cluster usage |
Size | Scales linearly with the amount of CPU and memory requested by a job. 100 per CPU, 20 per GB. | 52000+ depending on memory requested |
QOS | Adds 3000 to buy-in jobs to ensure they are always above backfill schedulers minimum priority for reserving resources | 3000 |