Rclone - rsync for cloud storage
Users could use this software to copy files from/to their Microsoft OneDrive or Google Drive cloud storage to/from HPCC disk space. This tool could also mount their cloud storage to HPCC disk so that the storage on cloud could be used as extended disk space.
Rclone is installed on HPCC system wide. To use it, users should first load the software module into their environment using command "module load Rclone". For more details of using rclone, users can visit Rclone web site at https://rclone.org/.
To start using it, user should run command "rclone config" to configure it. The instructions of this command could be found at https://rclone.org/commands/rclone_config/. Specifically, to configure for Google Drive, see https://rclone.org/drive/, and to configure for Microsoft Onedrive, see https://rclone.org/onedrive/ for instructions. The specific details of how to start using this software on HPCC could be found in the document Rclone.pdf
After successfully configured, users should be able to use "rclone" command to copy or mount the cloud storage to HPCC. There many rclone commands could be used to handle the file transfer and manage files on HPCC and cloud storage. To get help, use "rclone --help" as show in the following
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
A tool "cloudSync" is developed for user to synchronize the files between their cloud storages. It is accessible through "powertools". To use it, users need to have the module "powertools" loaded. Users are welcome to try it and report any problems to us via contact form here.
Following are a few examples of running rclone commands after successfully configured the cloud storage. Assume that the cloud storage is configured as the name "MyOneDrive".
(1) see current remote storage: As show, we can see that there are currently two remote cloud storage, "MyOneDrive" and "googledoc" are configured.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
(2) Check the remote storage information: We could check the remote storage usage and quota using "rclone about" command.
1 2 3 4 5
(3) List the contents of the cloud storage
1 2 3 4 5 6
(4) Copy files on HPCC to remote cloud:
1 2 3 4 5 6 7 8 9 10 11 12
(5) Copy files on cloud storage to HPCC:
1 2 3 4 5
Although "rclone copy" is similar as unix commands rsync and cp, when using it, users should be aware of the differences and know the details of its behavior.
(1) "rclone copy" does not transfer unchanged files, testing by size and modification time or MD5SUM. In this sense, it is similar as linux command rsync;
(2) When running
rclone copy source:sourcepath dest:destpath", if
dest:destpathshould also be a directory. It does not
copy the directory
source:sourcepath, instead, it will copy the
content of the directory
source:sourcepath to the destination
dest:destpathdoes not exist, it will be created
and the content of
source:sourcepathwill be stored in it.
(3) "rclone copyto" is a very similar rclone command to "rclone copy". The
only difference is that it can be used to upload single files to other
than their current name. When running
rclone copyto source:sourcepath dest:destpath", if
dest:destpath could be a new file name.
source:sourcepath is a directory, it would be the same as using
(6) Checks the files in the source and destination match.
1 2 3
For archiving your files to your cloud storage, if the connection between HPCC and your cloud storage is not stable, we would NOT recommend using "rclone move" because it may loss the data during the transfer. Instead, we recommend using "rclone copy" to successfully copy the files over and run "rclone check" to check if files are identical. After that, it is safe to delete local copy of the files.
When use "rclone mount" command to mount your cloud storage to HPCC, there are two things users should be careful:
(1) When running rclone mount, the process runs NOT as the user, instead, it runs as a "root" of the cloud storage. Therefore, user may see the error message like "mount helper error: fusermount: failed to open mountpoint for reading: Permission denied". User could use /tmp space for mount point because that space is accessible for all users. Users should be very careful to open the permission to others for the purpose of using rclone mount.
(2) The "rclone mount" users should unmount it after use using "fusermount -u \<endpoint_dir>". Note that sometimes the endpoint is not unmount from some nodes due to timeout or some reason, you may see the message like "Transport endpoint is not connected" when accessing the endpoint directory on the node. Just manually unmount it again should resolve the issue.
When use "rclone config" command to configure your cloud storage on HPCC, the command will guide you through an interactive setup process. At the step of auto config, after you chose "y", it will start authentication. You will see something like:
1 2 3 4
At this time, a firefox browser would be open. If you did not get the browser window, check if you used -X option to allow X11 forwarding when you run ssh. You may follow the instruction at [Connect to HPCC System to get the display right.
It will take a few minutes to get the browser open and connected. Please be patient. If the browser window is open but does not open the authentication page, you could manually input the link provided by the "rclone config" command to the firefox browser's url address box to connect to the site. DO NOT use the link on your personal computer's browser. The authentication have to use the browser on HPCC development node.