Frequently Asked Questions (FAQ)
What is my HPCC user name/password?
If you are affiliated with MSU, then your MSU NetID is your user name, and your NetID password is your HPCC password. This is the same as those for all the MSU online services. An HPCC account must be requested by an MSU faculty member at https://contact.icer.msu.edu/account
Can I reset my password on the HPCC because my login got denied after multiple failed attempts?
No. The authentication on the HPCC is directly tied to MSU. You will need to request a password reset at https://netid.msu.edu/netid/password/index.html
I used to be able to connect to the HPCC server, but now I can't. Why?
There can be multiple reasons for this, such as system downtime (so please check the ICER blog first). Another common reason is account expiry. The HPCC periodically disables users who are no longer affiliated with the university or registered with a class for which the instructor has created temporary student accounts. To re-activate your HPCC account, please have your PI submit a sponsoring form at https://contact.icer.msu.edu/sponsoredrenewal
Can you keep me posted on the current status of the HPCC?
Yes. Users are encouraged to follow the HPCC Announcements blog to keep updated on the status of HPCC (such as scheduled downtimes and urgent notices).
I am looking for help to troubleshoot my problem. How do I share my code/files with you?
We do not go to your directory to view files or test your code for that matter. Please send your files along with your reply to the ticket email.
Is there any limit per user on using the HPCC resources?
Limit on running a program on a dev-node: 2 CPU hours. If you are running a multi-threaded program, the wall time limit would be (roughly) 2 hours divided by the number of threads.
Limit on file counts: 1 million files for each of the home, research and scratch spaces.
Limit on storage size: each user has up to 1 TB of storage for
free (for each of the home and research directories); beyond 1 TB, the
cost is $125 per TB per year. For scratch space (i.e.
/mnt/scratch/<your_user_name>), 50 TB is the maximum and cannot be
increased further (users will need to archive/delete files when this
limit has been exceeded).
Limit on cluster usage: 1) the longest wall time you can request is 7 days; 2) the maximal number of CPU cores you can use is 1040 at any one time; 3) the maximal number of jobs that can be queued or running is 1000 (except in the scavenger queue); 4) non-buyin users have a maximum of 1 million CPU hours per year.
I would like to know more about the dev-node limit.
When you connect to any of the HPCC's dev-nodes, you will see the following message:
processes on development nodes are limited to two hours of CPU time.
The two hour CPU time limit is for each process you run on that dev-node. If one process uses CPU time greater than 2 hours, then only that process will be killed. You can, however, still connect to that dev-node, and run another process. Additionally, if your process uses 100% CPU (1 core), it will be terminated in two hours. If your process uses 200% CPU (2 cores), it will be terminated in one hour, and so on.
How do I check my cpu time usage?
Run the command
cputime. You can also run
sreport to get the information
with date such as
sreport user top user= start=2021-01-06 -t hour.
How do I get my storage usage data?
Run command "quota". You can't write new files if your quota has been used up.
I have a buyin account, do I need to specify it when I submit jobs?
No. When submitting a job without specifying an account, your default account is used. You can check your default account using the "buyin_status -l" command; buyin user's default is their buyin account. We recommend you read this if you have purchased buyin nodes.
What is HPCC's data backup policy?
We back up data in users' home and research directories, not in their scratch spaces.
You will have 24 hourly backups for the previous 24 hour period. For previous days however, we will provide daily backups only. Daily backups are performed at 12 AM Eastern Time and retained for 60 days.
My files in the scratch space are gone.
Files in scratch are automatically purged if the last modification time is older than 45 days. Note that the scratch spaces are not intended for long-term storage. Files saved in scratch have no back-up.
I can't transfer files from/to my scratch space.
If you use
Do you support running GPU jobs?
Yes. There are three GPU dev-nodes and a series of compute nodes in the cluster; see Cluster resources.
Why did I get an "Illegal Instruction" error?
This is usually because a program was compiled on a newer CPU
architecture (e.g., intel18) but then run on an older one (e.g.,
intel14). Our system has a range of CPUs, and the newest versions
support new instructions not available on the older CPUs. One short-term
fix is to run programs on the same CPU that they were compiled on. Based
on our experience, this error has occurred only on intel14 nodes and
therefore you need to avoid them. That is, for dev-node testing, pick
one from dev-intel16, dev-intel16-k80 and dev-intel18. For job
#SBATCH--constraint="[intel16|intel18]"in your SLURM
How do I use Python on the HPCC?
There are two methods: users can install their own version of Python with Anaconda or use the versions of Python installed on the HPCC system. See here.
I tried to use python matplotlib to plot, but got an error of "No module named '_tkinter'"
If you use the default python module (
/opt/software/Python/3.6.4-foss-2018a/bin/python) on a dev-node, you need to load the Tkinter module before using python in order to proceed
without errors. Run:
module load Tkinter/3.6.4-Python-3.6.4
I have a Python conflict. What should I do to resolve it?
Upon login to a dev-node, a default module list will load automatically.
Since Python/3.6.4 is included in the list, it can interfere with a
user's conda environment. As a consequence, your program may not be able
to find packages installed in your conda environment even if it has been
activated. In other words, the program still picks up Python/3.6.4 in
the module system. The solution is to run
module unload Python before
activating the conda environment.
How do I deactivate Conda base environment?
Many users have reported that after a local installation of Anaconda on
the HPCC, their login prompt changes to something starting with
-bash-4.2$. This is because conda activates the default environment,
base, upon startup. To disable this behavior, which often results in
conflicts with system defaults, users can run the following command:
Why did my "module load" command output errors?
There are many reasons that errors occur when you try loading a module.
However, the most common cause is that you have forgotten to run
module spider can also fail to find the module. Most likely
it's because your personal module cache is out of date. To clear it, run
rm -r ~/.lmod.d/.cache.
I want to install software packages, what should I do?
See here for instructions. Please note that we encourage users to install software on their own, if possible. The HPCC has provided numerous versions of compilers and libraries which should accommodate the vast majority of software across different fields.
If you are thinking of requesting the system-wide installation of a piece of software, we strongly recommend you check the following factors when submitting a request for software installation:
(1) How popular is the software? If it is not a popular software, are there other users on HPCC who would also be using it? If you are the only one using it, we would recommend it be installed in your home directory.
(2) What type of license agreement does the software have? Some software licenses may restrict use even when they are free. Examples include software with export control, specific end-user license agreement, etc. When software licenses restrict use, we typically recommend the user directly make an agreement with the software provider to obtain and install it in their home directory. If it will be used by a group of people, HPCC system administrators can help with setting up the group access in compliance with the license agreement.
(3) Is the software well maintained and up-to-date? If the software you wish to install is legacy software or is not being well maintained, chances are its installation will require an older version of its dependencies as well. The effort to install this software may then be greater than the effort required to find an up-to-date software with the same, similar, or even better functionality. It may be time to consider transitioning to using a newer software.
What does the message "Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions" mean after my job is submitted?
Once a job is submitted the scheduler adds it to the calculations and continues to update the status of the job as the system works. The status for a job will reflect the current state of the scheduler, so you will see this message update once the scheduler has found a place to put the job. There are always some nodes which are down or drained in the cluster due to normal maintenance, but the "reserved for jobs in higher priority partitions" is the important part, and simply indicates that the scheduler has not yet found a time to schedule the job. This will update as the scheduler continues to function.
Can I use HPC through web browsers?
Yes, we provide Open OnDemand, a web portal for easy web access to the HPCC. Check out this tutorial.
What should I do when I cannot load modules?
I have issues with copying files to my HPCC research space.
Many users have reported problems copying or transferring files to their research space. Although their research space still has plenty of space, they still get the following error message:
failed to ... Disk quota exceeded
This problem may occur because the folders which you copy or transfer files to have incorrect group ownership or no set-group-ID. Please read this for more instructions.
What is powertools?
The powertools module is a collection of software tools and examples that allows
researchers to better utilize HPC systems. Powertools was created to
help advanced users use the HPCC more effectively. To learn more about powertools,
run the command
I want copy files from/to my MS One Drive/Google Drive.
Rclone is currently installed on the HPCC. This software supports research in the cloud and helps HPCC users to sync files and directories between MSU’s HPCC and their cloud storage, including OneDrive and Google Drive. Please refer to Rclone
How to check the HPCC node usage?
Users can see this information by simply running the
node_status command on any dev node. We also offer a web-based dashboard at https://icer.msu.edu/dashboard.
Does HPCC offer a cheaper long-term archiving plan?
We do not. However, MSU offers the Data Storage Finder (https://data-storage-finder.tech.msu.edu, on-campus only). There are several possible options for data archiving.