This document highlights several simple methods to transfer files to the HPCC home and research directories. There are two main gateway systems for copying files.
hpcc.msu.edu: This is our login gateway. While it can be used for file transfer, it's not intended for high volumes of files. More importantly, the scratch space is not mounted there and so you can't access your files on scratch.
rsync.hpcc.msu.edu: It has access to scratch, and is dedicated to file transfer. Although this gateway is named by the popular Linux "rsync" command, it can be used for "sftp" or "scp" as well. Starting in October 2022, login to the rsync gateway will accept SSH keys as the ONLY authentication method. Username/password won't work. Please refer to the SSH key tutorial for setting up your keypair.
Using FileZilla for Mac and Windows
General password-based setup
FileZilla is a GUI application for copying files between a remote host and your computer.
Download and install the appropriate (free) FileZilla client from https://filezilla-project.org/download.php?show_all=1 and select your operating system version. Mac users will have to 'unzip' the file and move the application into your Applications folder.
To use, launch the program.
In the top dialog boxes, enter:
<your HPCC username>
- (Port) 22
Then click connect or quickconnect. The first time you use this, you will have to accept the host certificate.
Once connected, the left column displays files on your local computer, the right column displays files on HPCC.
You can select the appropriate directories by double clicking through each tree. Files can be dragged and dropped from one column to the next. By dragging files from the left column to the right, you are uploading files to HPCC from your local computer. By dragging files from the right column to the left, you can download files from HPCC to your local computer.
As mentioned above, password authentication will be disabled in October 2022. See the screenshot below for how to set up your FileZilla using SSH keys. If you haven't generated them, please refer to the SSH key tutorial for how.
Fore more detailed steps, check out this FileZilla doc: How to Use SSH Private Keys for SFTP
Using Linux commands
A number of different command-line utilities are available to OS X and Linux users. Each of them has its own advantages.
Basic file copy (scp)
A simple command for transferring files between the cluster and another host is scp. To copy a file from a local directory to file space on the cluster, use a line like
scp example.txt firstname.lastname@example.org:example_copy.txt
This will copy the file named example.txt in the local host's home directory to the user's home directory on the cluster, with the copy having the name example_copy.txt. Leaving the space after the colon blank gives the new file the same name as the original. Note: To transfer a file name with spaces you must put a backslash before each space in your file name, i.e.
scp "My File Name" email@example.com:"My\ File\ Name".
To copy a file from the cluster to your local directory,
scp firstname.lastname@example.org:example.txt ./example_copy.txt
will copy the file named example.txt from the user's home directory on the cluster to the home directory of the local host, naming the new file example_copy.txt. Leaving the space after the slash blank gives the new file the same name as the original. The -r option can be used to copy entire directories recursively.
Synchronize directories (rsync)
If you are an advanced LINUX/Mac user, there is a wonderful little utility that makes mirroring directories simple. The syntax looks very similar to scp.
<local_dir>on my local computer to
<hpcc_dir>on hpcc, the following command can be issued.
rsync -ave ssh <local_dir> email@example.com:<hpcc_dir>
In the above command, rsync will scan through both directories. If any files in the
<local_dir>are newer, they will be uploaded to
<hpcc_dir>. (It is also possible to get rsync to upload ALL different files, regardless of which is newer).
To mirror the HPCC directory to your local system, call
rsync -ave ssh firstname.lastname@example.org:<hpcc_dir> <local_dir>
Please use rsync command with the option --chmod=Dg+s to transfer files from a local computer to your research space.
See the following example:
rsync -ave ssh TestDir --chmod=Dg+s <username>@rsync.hpcc.msu.edu:/mnt/research/<GroupName>/
!!! Note: the first time you use rsync, you might want to add the -n flag to do a dry run before any files are copied.
Interactive file copy (sftp)
When preforming several data transfers between hosts, the sftp command may be preferable, as it allows the user to work interactively. Running
from a local host establishes a connection between that host and the cluster. Both hosts can be navigated. For the local file system, lcd changes to the specified directory, lpwd prints the working directory, and lls prints a list of files in the current directory. For the remote file system, the same three commands are available, minus the leading "l." Also available are commands to change permissions, rename files, and manipulate directories on the remote host. The two key commands are
get <file>, which copies the file in the remote working directory to the local working directory, and
put <file>, which copies the file in the local working directory to the remote working directory. The quit command closes the connection between hosts.
Copy file from Internet (wget)
Wget is a simple command useful for copying files from the Internet to a user's file space on the cluster. Submitting the line
downloads examplefile.txt to the user's working directory.