Display Compute Nodes and Job Partitions by sinfo command
Information of Compute Nodes
If you would like to run a job with a lot of resources, it is a
good idea to check available resources, such as which nodes are available
as well as how many cores and how much memory is available on those nodes,
so the job will not wait for too much time. Users can use SLURM command sinfo to get
a list of nodes controlled by the job scheduler. Such as, running the
command sinfo -N -r -l
, where the specifications -N
for showing
nodes, -r
for showing nodes only responsive to SLURM and -l
for
long description are used.
However, for each node, sinfo
displays all possible partitions
and causes repetitive information. Here, the powertools command
node_status
can be used to display much better results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
The result of node_status
is a good reference to find out how many nodes available for your
jobs as it displays important information including node names, buyin accounts, node states,
CPU cores, memory, GPU, and the reason the node is unavailable.
If you need more complete details of a particular node, you can use
scontrol show node -a <node_name>
command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
SLURM Partitions for Jobs
One of the important details about a node is what kind of
jobs can run on it. For example, if a node is a buy-in node, only jobs
with walltime equal to or less than 4 hours can run for a non-buyin
users. We can check the summary of all partitions using sinfo
with
the -s
specification:
1 2 3 4 5 6 |
|
where the list of job partitions and their setup for walltime limit and
nodes are shown. More detailed information for each job partition can
also be found by -p
specification:
1 2 3 4 5 6 7 |
|
Users can also show nodes only allowed for specific job partitions by
using -N
and -p
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
For a complete instruction of sinfo
, please refer to
the SLURM web page.