The HPCC's Layout

The HPCC is comprised of three different kinds of nodes: the "gateway" entry nodes, development nodes, and compute nodes. In a typical workflow, users will connect to the HPCC through the an entry node, then connect to a development node to compile and test code before submitting jobs to the SLURM queue to be run on a compute node. This workflow is demonstrated in the diagram below.

Each node type is explained in more detail in the following sections. Information on the HPCC's filesystems is available within a separate section.

Diagram of the ICER HPCC layout. From left to right are the user's computer, the gateway nodes, the development nodes, and the compute nodes. These layers are connected by right-pointing arrows.

Entry Nodes

The gateway nodes are the only nodes directly accessible over the internet. Users connect to these nodes from their personal computers using ssh before accessing other parts of the HPCC.

Gateway Nodes

These nodes are the default accessed via ssh <username>@hpcc.msu.edu as in the top fork of the diagram above. The gateway nodes are not meant for compilig or running software, accessing the scratch space, or connecting to compute nodes. Users should only use the gateway nodes to ssh to development nodes. Alternatively, users may set up SSH tunneling to automate the process of passing through the gateway to a development node.

Development Nodes

From the gateway node, users can connect to any development node to compile their jobs or and run short tests. They may also access files on the scratch file system.

Jobs on the development nodes are limited to two hours of CPU time. More information is available on the development node page.

Each development node is configured to match the compute nodes of the same cluster. If you would like your job to be able to run on any cluster (as is the default for the queue; see the section on Automatic Job Constraints) you should not compile with architecture-specific tuning (e.g. -march or -x).

Warning

Code compiled on different architectures (e.g. intel16 vs intel18) may have errors when running on other nodes. To avoid this, compile your code on a newer development node or specify --constraint=[node type] in your SLURM batch script where [node type] is the development node type you compiled on.

Compute Nodes

ICER maintains several clusters of compute nodes. Users submit jobs to the SLURM scheduler which assigns compute nodes based on the resources requested.

A user may see which nodes their job is running on using squeue -u <username>. Not providing a username to squeue will show all jobs currently running on the system. Users may ssh directly to a compute node only if they have a job running on that node. See our page on connecting to compute nodes for more.

Comparison to a personal computer

	Laptop/Desktop	HPCC Clusters
Number of Nodes	1	983
Sockets per node	1	2, 4, 8
Cores per node	4, 8, or 16	28, 40, 64, 72, 96, 128 or 144
Cores total	4, 8, or 16	78,844
Core Speed	2.7 - 5 GHz	2.5-3.7 GHz
RAM (memory)	8, 16 or 32 GB	96 GB, 128 GB, 512 GB, 768 GB, 1.5 TB, 2.3 TB or 6TB
File Storage	250, 500 GB or 1TB	100 GB (Home), 3 TB(Research), 50TB(Scratch)
Connection to other computers	Campus ethernet1 Gbit/sec	"Infiniband" 100-400 Gbit/sec
Users	1	~4,000
Schedule	On Demand	24/7 via queue