Development nodes
Warning
Any long running (CPU usage or GPU access time > 2 hours) jobs on dev nodes will be killed automatically without advance notice. Abuse of development nodes may result in suspension from the system.
Processes that use more than 2 hours of CPU time (defined as work across all CPUs that equal 2 hours of work on one CPU, e.g., a process using 12 CPUs at 100% will be killed after 10 minutes) or 2 hours of GPU time (defined as a process using a GPU for at least 2 hours of total elapsed time) will be killed automatically without advance notice. Circumventing this policy and/or abusing the intended usage of development nodes may result in suspension from the system.
The HPCC has several development nodes that are available for users to compile their code and do short runs (less than 2 hours) to estimate run-time and memory usage.
Note
There is an 80% memory cap (percentage of total RAM on the node) for processes running on development nodes. Processes that exceed this memory cap get killed automatically.
These development nodes run the latest operating system and have similar configurations and environment setups as the compute nodes of the same clusters. Please use these development nodes to compile your program and test the work flow of your job script. For running long-time or large-resource computations, please submit jobs to use compute nodes. If your long-running (more than 2 hours) computations require interactive development, testing or debugging, you should use interactive jobs to work on compute nodes.
Users may ssh to the development nodes after
connecting to the gateway via SSH.
To access a certain development node, for example dev-amd20, please
run ssh dev-amd20 from the gateway. Users may also directly
connect to development nodes by setting up SSH Tunneling
Alternatively, they may be accessed through the "Development Nodes" tab on OnDemand.
Nodes with -v100 or -h200 suffixes have GPU cards required by GPU-enabled software, but may be
used for any software. Note there is not a development node containing the AMD20 A100 GPUs.
| Node Hostname | Cores | Memory | Notes |
|---|---|---|---|
| dev-amd24 | 192 | 768GB | AMD EPYC 9654 96-Core Processor @ 2.4GHz |
| dev-amd24-h200 | 192 | 768GB | AMD EPYC 9654 96-Core Processor @ 2.4GHz and 4 H200 GPUs |
| dev-amd20 | 128 | 960GB | AMD EPYC 7H12 64-Core Processor @ 2.6GHz |
| dev-amd20-v100 | 48 | 187GB | Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz and 4 Tesla V100S |
| dev-intel18 | 40 | 377GB | Two 2.4Ghz 20-core Intel Xeon Gold 6148 CPU (40 cores total) |
Once your program is compiled and job script is tested, users can submit it to the SLURM queue by specifying various required resources such as job duration, memory usage, number of CPUs, software license reservations and so on.