Display Compute Nodes and Job Partitions by sinfo command
Information of Compute Nodes
If you would like to run a job with a lot of resources, it is a good idea to check available resources, such as which nodes are available as well as how many cores and how much memory is availabe on those nodes, so the job will not wait for too much time. Users can use SLURM command sinfo to get a list of nodes controlled by the job scheduler. Such as, running the command sinfo -N -r -l, where the specifications -N for showing nodes, -r for showing nodes only responsive to SLURM and -l for long description are used.
However, for each node, sinfo displays all possible partitions and causes repetitive information. Here, the powertools command node_status can be used to display much better results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
The result of node_status is a good reference to find out how many nodes available for your jobs as it displays important information including node names, buyin accounts, node states, CPU cores, memory, GPU, and the reason the node is unavailable.
If you need more complete details of a particular node, you can use
scontrol show node -a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
SLURM Partitions for Jobs
One of the important details about a node is what kind of jobs can run on it. For example, if a node is a buy-in node, only jobs with walltime equal to or less than 4 hours can run for a non-buyin users. We can check the summary of all partitions using sinfo with the -s specification:
1 2 3 4 5 6
where the list of job partitions and their setup for walltime limit and nodes are shown. More detailed information for each job partition can also be found by -p specification:
1 2 3 4 5 6 7
Users can also show nodes only allowed for specific job partitions by using -N and -p:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
For a complete instruction of sinfo, please refer to the SLURM web page.