Skip to content

How Jobs are Scheduled

Schedulers

SLURM schedules jobs in two ways: the main scheduler and the backfill scheduler. The main scheduler constantly tries to start high priority jobs. The backfill scheduler considers all jobs, and starts any jobs that won't defer the start time of a higher priority job.

Scheduler Function When it Runs Run Time
Main Launches high priority jobs that can start immediately. Stops evaluating jobs once it encounters a job that cannot be started.  About every 2 seconds 0.08-2 seconds
Backfill Evaluates the entire queue. Launches jobs that won't interfere with the start time of a higher priority job. Sets jobs' StartTime and SchedNodeList. 20 seconds after the last backfill cycle completes 2-15+ minutes

StartTime and SchedNodeList

The backfill scheduler sets the StartTime and SchedNodeList parameters on jobs that can start within the next 7 days. These parameters can be viewed in the output of scontrol show job <jobid>. StartTime estimates when a job will start and SchedNodeList shows the nodes this job might start on. StartTime is only an estimate. These values are updated every time the backfill scheduler runs and may change as running jobs complete and new jobs are submitted.

Minimum Job Requirements to Avoid Deferment

Jobs must meet certain criteria before the backfill scheduler will avoid potentially deferring them through starting lower priority jobs. These thresholds allow the backfill scheduler to cycle faster and maintain high system utilization.

Criteria Minimum Description
Priority 3000 Jobs require a minimum priority of 3000 is require to avoid potential deferment in scheduling. Buy-in account jobs are never below this threshold.

Job Priority Factors

A job's priority is determined by a combination of several priority factors. Age, size, fairshare, and whether it was submitted to a buy-in account all contribute to the job’s priority.

Priority Factor Description Maximum Contribution to Priority
Age Starts at zero at job submission, then increases linearly to a maximum of 60000 after 30 days 60000 after 30 days
Fairshare Starts at 60000 and decreases and users' recent usage goes up. Usage for this calculation is decayed 50% each day 60000 for no recent cluster usage
Size Scales linearly with the amount of CPU and memory requested by a job. 100 per CPU, 20 per GB. 52000+ depending on memory requested
QOS Adds 3000 to buy-in jobs to ensure they are always above backfill schedulers minimum priority for reserving resources 3000

FairShare

The FairShare priority factor of a job is calculated based on recent usage compared to overall cluster usage. Each user/account pair is assigned a "share" of the cluster based on the overall number of users in the accounting database. Usage is tracked based on the cluster's configured TRES (Trackable Resource) billing weights. A weight is set for CPUs, memory, and GPUs. Each job's allocated resources are multiplied by these weights and the job's run time to get a combined measure of TRES seconds. TRES seconds are then tracked for each user/account pair. When consumed TRES seconds is equal to the share of TRES seconds relative to the entire cluster, the FairShare factor will be 0.5 (30000 weighted), when consumed TRES seconds exceeds double the share for the entire cluster, the FairShare factor will be zero.

Usage accrual for this calculation decays with a half life of one day and the effect of this decay is calculated every five minutes.

The exact weights and share values change with the size of the cluster and accounting database, but can be viewed using the fairshare_info powertool.

$ module load powertools
$ fairshare_info

TresBillingWeights:
  CPU    = 1.0
  Memory = 0.152
  GPU    = 250

Current Total Cluster Usage:  3102285712231 TRES seconds
FairShare Ratio Per User:     0.000051
User Portion of Cluster:      158216571 TRES seconds

FairShare is calculated every 00:05:00 and your usage decays 50% every 1-00:00:00

The following usage will reduce FairShare priority by half, twice will zero it:
     43949 CPU Hours
       176 GPU Hours
       282 GB Hours

       281 1 Hour / 1 CPU / 1 GB Jobs
       108 1 Hour / 1 CPU / 1 GB / 1 GPU Jobs

Current FairShare Priority Status:

  Account             Priority  Usage (TRES Seconds)
  --------------------------------------------------
  general                59727              1046312
  scavenger              60000                    0