Display Compute Nodes and Job Partitions by sinfo command
Information of Compute Nodes
If you would like to run a job with a lot of resources, it is a
good idea to check available resources, such as which nodes are available
as well as how many cores and how much memory is available on those nodes,
so the job will not wait for too much time. Users can use SLURM command sinfo to get
a list of nodes controlled by the job scheduler. Such as, running the
command sinfo -N -r -l
, where the specifications -N
for showing
nodes, -r
for showing nodes only responsive to SLURM and -l
for
long description are used.
However, for each node, sinfo
displays all possible partitions
and causes repetitive information. Here, the powertools command
node_status
can be used to display much better results:
node_status # powertools command
Thu Aug 7 08:58:47 AM EDT 2025
NodeName Account State CPU(Load:Aloc Idl:Tot) Mem(Aval:Tot)Mb GPU(I:T) Reason
----------------------------------------------------------------------------------------------------------
acm-000 deyoungbuyin IDLE 0.00: 0 128:128 375280: 505202 N/A
acm-001 deyoungbuyin IDLE 0.05: 0 128:128 387945: 505202 N/A
.......
acm-018 general ALLOCATED 401.98:128 0:128 278763: 505202 N/A
acm-019 general MIXED 430.29:124 4:128 303526: 505202 N/A
agg-000 ais-markle MIXED 1.08: 12 180:192 643273: 763203 N/A
agg-001 rhee-lab MIXED 0.16: 2 190:192 662472: 763203 N/A
.......
agg-015 general MIXED 125.67:164 28:192 473948: 763203 N/A
agg-016 general MIXED 94.12:123 69:192 590532: 763203 N/A
.......
nal-000 general ALLOCATED 4.15:128 0:128 387031: 505170 a100(3:4)
nal-001 general ALLOCATED 128.12:128 0:128 387678: 505170 a100(3:4)
.......
nvl-007 general MIXED 2.83: 5 35: 40 178735: 376162 v100(3:8)
intel18 => 50.8%(buyin) 69.5%( 187) 16.5%: 20.3%( 9224) 46.6%(35.9Tb) 54%(144) Usage%(Total)
amd20 => 71.8%(buyin) 83.2%( 358) 33.1%: 41.4%(49456) 56.2%( 239Tb) 74%(124) Usage%(Total)
amd22 => 48.6%(buyin) 77.8%( 72) 169.1%: 58.5%( 9728) 42.4%(54.4Tb) N/A( 0) Usage%(Total)
Summary => 67.7%(buyin) 49.2%( 983) 43.6%: 35.5%(78844) 43.8%( 397Tb) 30%(564) Usage%(Total)
The result of node_status
is a good reference to find out how many nodes available for your
jobs as it displays important information including node names, buyin accounts, node states,
CPU cores, memory, GPU, and the reason the node is unavailable.
If you need more complete details of a particular node, you can use
scontrol show node -a <node_name>
command:
scontrol show node -a acm-019
NodeName=acm-019 Arch=x86_64 CoresPerSocket=16
CPUAlloc=124 CPUEfctv=128 CPUTot=128 CPULoad=430.29
AvailableFeatures=acm,amd22
ActiveFeatures=acm,amd22
Gres=(null)
NodeAddr=acm-019 NodeHostName=acm-019 Version=24.05.8
OS=Linux 5.15.0-126-generic #136-Ubuntu SMP Wed Nov 6 10:38:22 UTC 2024
RealMemory=505202 AllocMem=457514 FreeMem=303526 Sockets=8 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=150000 Weight=501 Owner=N/A MCS_label=N/A
Partitions=iceradmin,scavenger,general-short,general-long,...
BootTime=2025-05-10T15:06:37 SlurmdStartTime=2025-08-05T10:48:04
LastBusyTime=2025-06-14T05:03:08 ResumeAfterTime=None
CfgTRES=cpu=128,mem=505202M,billing=76790
AllocTRES=cpu=124,mem=457514M
CurrentWatts=0 AveWatts=0
SLURM Partitions for Jobs
One of the important details about a node is what kind of
jobs can run on it. For example, if a node is a buy-in node, only jobs
with walltime equal to or less than 4 hours can run for a non-buyin
users. We can check the summary of all partitions using sinfo
with
the -s
specification:
sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
scavenger up 7-00:00:00 480/429/61/970 acm-[000-047,050-069],agg-[000-062,065-072],agx-[000-001],amr-[000-237,240-244,246-253],nal-[000-003,008-010],ncc-000,nch-[000-003],neh-[000-001],nel-[000-001],nfh-[000-004],nif-[001-005],nvf-[000-020],nvl-[000-007],skl-[000-023,025-105,107-115,120-144,148-167]
ondemand up 7-00:00:00 4/0/0/4 amr-[184-187]
general-short up 4:00:00 476/429/61/966 acm-[000-047,050-069],agg-[000-062,065-072],agx-[000-001],amr-[000-183,188-237,240-244,246-253],nal-[000-003,008-010],ncc-000,nch-[000-003],neh-[000-001],nel-[000-001],nfh-[000-004],nif-[001-005],nvf-[000-020],nvl-[000-007],skl-[000-023,025-105,107-115,120-144,148-167]
general-long up 7-00:00:00 197/68/34/299 acm-[018-047,061-067],agg-[015-047],agx-[000-001],amr-[188-237,246-253],ncc-000,skl-[027-052,054-100,102-105,107-112,143-144,162-163]
general-long-bigmem up 7-00:00:00 15/1/1/17 acm-[047,061-067],agg-049,amr-[103,246-251]
general-long-gpu up 7-00:00:00 10/6/1/17 nal-[000-001,010],nel-001,nfh-003,nvf-[018-020],nvl-[005-007]
general-long-grace up 7-00:00:00 0/0/1/1 ncc-000
where the list of job partitions and their setup for walltime limit and
nodes are shown. More detailed information for each job partition can
also be found by -p
specification:
sinfo -p general-long -r -l
Thu Aug 07 09:06:08 2025
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE RESERVATION NODELIST
general-long up 7-00:00:00 1-infinite no NO all 2 drained amd24_perf agg-[021,047]
general-long up 7-00:00:00 1-infinite no NO all 3 draining acm-024,agg-022,amr-204
general-long up 7-00:00:00 1-infinite no NO all 4 drained agg-036,amr-247,ncc-000,skl-102
general-long up 7-00:00:00 1-infinite no NO all 2 down amd24_perf agx-[000-001]
general-long up 7-00:00:00 1-infinite no NO all 176 mixed acm-[019-020,023,025-026,031-034,036-037,039,041-044,047,061-067],agg-[015-020,023-025,027-029,031-035,037-046],amr-[188-203,205-237,246,248-253],skl-[027,032,034-035,040-043,048-052,054-056,058-100,103-105,107-112,162]
general-long up 7-00:00:00 1-infinite no NO all 18 allocated acm-[018,021-022,027-030,035,038,040,045-046],agg-[026,030],skl-[057,143-144,163]
Users can also show nodes only allowed for specific job partitions by
using -N
and -p
:
sinfo -N -l -r -p general-short,general-long
Thu Aug 07 09:06:41 2025
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
acm-000 1 general-short idle 128 8:16:1 505202 150000 501 acm,amd2 none
acm-001 1 general-short idle 128 8:16:1 505202 150000 501 acm,amd2 none
acm-002 1 general-short idle 128 8:16:1 505202 150000 501 acm,amd2 none
acm-003 1 general-short idle 128 8:16:1 505202 150000 501 acm,amd2 none
acm-004 1 general-short mixed 128 8:16:1 505202 150000 501 acm,amd2 none
.......
skl-163 1 general-short allocated 40 2:20:1 376162 150000 303 skl,inte none
skl-164 1 general-short idle 40 2:20:1 376162 150000 303 skl,inte none
skl-165 1 general-short idle 40 2:20:1 376162 150000 303 skl,inte none
skl-166 1 general-short mixed 40 2:20:1 376162 150000 303 skl,inte none
skl-167 1 general-short idle 40 2:20:1 376162 150000 303 skl,inte none
For a complete instruction of sinfo
, please refer to
the SLURM web page.