Skip to content

Display Compute Nodes and Job Partitions by sinfo command

Information of Compute Nodes

If you would like to run a job with a lot of resources, it is a good idea to check available resources, such as which nodes are available as well as how many cores and how much memory is availabe on those nodes, so the job will not wait for too much time. Users can use SLURM command sinfo to get a list of nodes controlled by the job scheduler. Such as, running the command sinfo -N -r -l, where the specifications -N for showing nodes, -r for showing nodes only responsive to SLURM and -l for long description are used.

However, for each node, sinfo displays all possible partitions and causes repetitive information. Here, the powertools command node_status can be used to display much better results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ node_status                       # powertools command

Wed Apr 22 11:14:40 EDT 2020

NodeName       Account         State     CPU(Load:Aloc Idl:Tot)    Mem(Aval:Tot)Mb   GPU(I:T)   Reason
----------------------------------------------------------------------------------------------------------
csm-001        general       ALLOCATED      13.61: 20    0: 20       45186: 246640      N/A
csm-002       albrecht         MIXED        10.14: 15    5: 20        1072: 246640      N/A
csm-003         colej        ALLOCATED       7.45: 20    0: 20       50032: 246640      N/A
......
csn-005        general         MIXED         9.92: 12    8: 20       16160: 118012    k20(0:2)
......
cs*      =>   33.3%(buyin)   91.4%(162)     43.6%: 59.5%( 3240)      69.9%(17.0Tb)    97%( 78)   Usage%(Total)
......
......
lac-078        general         MIXED        11.38:  8   20: 28       69884: 118012      N/A
lac-079          ptg         ALLOCATED      22.37: 28    0: 28       15612: 118012      N/A
lac-080       merzjrke         MIXED         2.48: 16   12: 28       50032: 246640    k80(0:8)
......
......
vim-002          ccg           MIXED        66.14: 63   81:144     5427008:6145856      N/A

intel14  =>   34.5%(buyin)   91.7%(168)     47.8%: 62.7%( 3576)      60.1%(31.1Tb)    97%( 78)   Usage%(Total)
intel16  =>   69.0%(buyin)   98.8%(429)     55.2%: 65.1%(12200)      76.6%(79.9Tb)    70%(384)   Usage%(Total)
intel18  =>   63.6%(buyin)   99.4%(176)     45.8%: 55.8%( 7040)      77.1%(31.3Tb)    55%( 64)   Usage%(Total)

Summary  =>   60.3%(buyin)   97.4%(773)     51.2%: 61.9%(22816)      73.1%( 142Tb)    72%(526)   Usage%(Total

The result of node_status is a good reference to find out how many nodes available for your jobs as it displays important information including node names, buyin accounts, node states, CPU cores, memory, GPU, and the reason the node is unavailable.

If you need more complete details of a particular node, you can use scontrol show node -a command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ scontrol show node -a skl-166
NodeName=skl-166 Arch=x86_64 CoresPerSocket=20
   CPUAlloc=0 CPUTot=40 CPULoad=0.01
   AvailableFeatures=skl,gbe,intel18,ib,edr18
   ActiveFeatures=skl,gbe,intel18,ib,edr18
   Gres=(null)
   NodeAddr=skl-166 NodeHostName=skl-166 Version=18.08
   OS=Linux 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018
   RealMemory=376162 AllocMem=0 FreeMem=382562 Sockets=2 Boards=1
   State=DOWN ThreadsPerCore=1 TmpDisk=174080 Weight=103 Owner=N/A MCS_label=N/A
   Partitions=general-short,general-short-18,general-long,general-long-18,qian-18,nvl-benchmark-18,piermaro-18,vmante-18,liulab-18,devolab-18,tsangm-18,plzbuyin-18,chenlab-18,shadeash-colej-18,allenmc-18,cmse-18,seiswei-18,niederhu-18,daylab-18,junlin-18,mitchmcg-18,pollyhsu-18,davidroy-18,yueqibuyin-18,eisenlohr-18
   BootTime=2019-02-11T15:07:38 SlurmdStartTime=2019-02-11T15:08:44
   CfgTRES=cpu=40,mem=376162M,billing=57176
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Currently being imaged [fordste5@2019-02-11T09:49:30]

SLURM Partitions for Jobs

One of the important details about a node is what kind of jobs can run on it. For example, if a node is a buy-in node, only jobs with walltime equal to or less than 4 hours can run for a non-buyin users. We can check the summary of all partitions using sinfo with the -s specification:

1
2
3
4
5
6
$ sinfo -s
PARTITION           AVAIL  TIMELIMIT   NODES(A/I/O/T)  NODELIST
general-short          up    4:00:00    729/26/16/771  csm-[001-005,007-010,017-022],csn-[001-039],csp-[006-007,016-020,025-026],css-[001-003,007-012,014,016-020,023,032-036,038-045,047-050,052-067,071-072,074-076,079-085,087-095,097-103,106-109,111-127],lac-[000-225,228-247,250-261,276-369,372,374-445],nvl-[000-007],qml-[000-005],skl-[000-167],vim-[000-002]
general-long           up 7-00:00:00      269/0/8/277  csm-001,csn-020,csp-[006-007,016-018,020,025],css-[008-012,014,016-019,023,032,034-036,038-045,047-050,052-066,071,075-076,079-080,083,087-089,092-095,097-099,107,118,121,124,126],lac-[038-044,078,123,209,217,225,228,230-235,246-247,276-284,300-301,336-339,353-360,363-364,372,374-399,401-420,422-445],skl-[023,026-112]
general-long-bigmem    up 7-00:00:00        17/0/0/17  lac-[252-253,306],qml-[000,005],skl-[143-147,162-167],vim-001
general-long-gpu       up 7-00:00:00       46/12/0/58  csn-[001-019,021-036],lac-[030,087,137,143,192-199,287-290,292-293,342,348],nvl-[005-007]

where the list of job partitions and their setup for walltime limit and nodes are shown. More detailed information for each job partition can also be found by -p specification:

1
2
3
4
5
6
7
$ sinfo -p general-long -r -l
Mon Jul 13 12:22:16 2020
PARTITION    AVAIL  TIMELIMIT   JOB_SIZE ROOT OVERSUBS     GROUPS  NODES       STATE NODELIST
general-long    up 7-00:00:00 1-infinite   no       NO        all      2    draining lac-[231,247]
general-long    up 7-00:00:00 1-infinite   no       NO        all      1     drained css-053
general-long    up 7-00:00:00 1-infinite   no       NO        all    217       mixed csm-001,csp-[006,017-018,020,025],css-[010,018-019,023,032,034-035,038,044,047-049,052,055-056,061-066,075,088-089,098-099,107,118,126],lac-[038-044,078,123,209,217,225,228,230,232,234-235,276-280,282-284,300-301,336-337,339,353-360,363,372,374-382,384-399,401-420,423,427-445],skl-[023,026,028-029,031,033-034,036-042,044-046,048,050-067,069-079,081-094,096-106,108-112]
general-long    up 7-00:00:00 1-infinite   no       NO        all     50   allocated csn-020,csp-016,css-[008-009,011,016-017,036,039-043,045,050,054,057-060,083,087,092-095,097,121,124],lac-[233,246,281,338,364,383,422,424-426],skl-[027,030,032,035,043,047,049,068,080,095,107]

Users can also show nodes only allowed for specific job partitions by using -N and -p:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
$ sinfo -N -l -r -p general-short,general-long
Mon Jul 13 12:25:58 2020
NODELIST   NODES     PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
csm-001        1 general-short       mixed   20   2:10:1 246640   174080    101 gbe,ib,i none
csm-001        1  general-long       mixed   20   2:10:1 246640   174080    101 gbe,ib,i none
csm-002        1 general-short       mixed   20   2:10:1 246640   174080    101 gbe,ib,i none
csm-003        1 general-short       mixed   20   2:10:1 246640   174080    101 gbe,ib,i none
csm-004        1 general-short       mixed   20   2:10:1 246640   174080    101 gbe,ib,i none
csm-005        1 general-short       mixed   20   2:10:1 246640   174080    101 gbe,ib,i none
...
...
skl-166        1 general-short       mixed   40   2:20:1 376162   174080    103 skl,gbe, none
skl-167        1 general-short       mixed   40   2:20:1 376162   174080    103 skl,gbe, none
vim-000        1 general-short       mixed   64   4:16:1 306780   174080    102 gbe,inte none
vim-001        1 general-short       mixed   64   4:16:1 306780   174080    102 gbe,inte none
vim-002        1 general-short   allocated  144   8:18:1 614585   174080    102 gbe,inte none

For a complete instruction of sinfo, please refer to the SLURM web page.