Rclone - rsync for cloud storage
Rclone can be used to copy files from/to their Microsoft OneDrive or Google Drive cloud storage to/from HPCC disk space. This tool can also be used to mount a user's cloud storage to their HPCC disk so that the storage on cloud could be used as extended disk space.
Rclone is installed on HPCC system wide. To use it, users should first load the software module into their environment using command:
1 |
|
For more details of using rclone, users can visit Rclone web site at https://rclone.org/.
To start using Rclone, users need to run the following command to configure it:
1 |
|
The instructions for this command could be found at https://rclone.org/commands/rclone_config/.
Specifically, to configure for Google Drive, see https://rclone.org/drive/, and to configure for Microsoft Onedrive, see https://rclone.org/onedrive/ for instructions. The specific details of how to start using this software on HPCC could be found in the document Rclone.pdf
After successfully configuring the software, users should be able to use "rclone" command to copy or mount the cloud storage to HPCC. There are many rclone sub-commands that can be used to handle file transfers and manage files on HPCC and cloud storage. To get help, use "rclone --help" as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
The tool "cloudSync" was developed to help user to synchronize the files between their cloud storages. It is accessible through "powertools" which should automatically loaded upon logging into HPCC, but can be manually loaded with 'ml load powertools' if need be. Users are welcome to try it and report any problems to us via contact form here.
Following are a few examples of running rclone commands after successfully having configured the cloud storage. Assume that the cloud storage is configured as the name "MyOneDrive".
(1) See current remote storage
We can check the current configuration of rclone using 'rclone config'. As is shown below, we can see that there are currently two remote cloud storage configured: "MyOneDrive" and "googledoc"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
(2) Check the remote storage information
We can see the remote storage usage and quota using "rclone about" command.
1 2 3 4 5 |
|
(3) List the contents of the cloud storage
1 2 3 4 5 6 |
|
(4) Copy files on HPCC to remote cloud:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
(5) Copy files on cloud storage to HPCC:
1 2 3 4 5 |
|
Note
Although "rclone copy" is similar as unix commands rsync and cp, when using it, users should be aware of the differences and know the details of its behavior.
(1) "rclone copy" does not transfer unchanged files, testing by size and modification time or MD5SUM. In this sense, it is similar as linux command rsync;
(2) When running
"rclone copy source:sourcepath dest:destpath
", if source:sourcepath
is
a directory, dest:destpath
should also be a directory. It does not
copy the directory source:sourcepath
, instead, it will copy the
content of the directory source:sourcepath
to the destination
dest:destpath
. If dest:destpath
does not exist, it will be created
and the content of source:sourcepath
will be stored in it.
(3) "rclone copyto" is a very similar rclone command to "rclone copy". The
only difference is that it can be used to upload single files to files other
than their current name. When running
"rclone copyto source:sourcepath dest:destpath
", if source:sourcepath
is
a file, dest:destpath
could be a new file name.
If source:sourcepath
is a directory, it would be the same as using
"rclone copy".
(6) Checks the files in the source and destination match.
1 2 3 |
|
Note
For archiving your files to your cloud storage, if the connection between HPCC and your cloud storage is not stable, we would NOT recommend using "rclone move" because it may loss the data during the transfer. Instead, we recommend using "rclone copy" to successfully copy the files over and run "rclone check" to check if files are identical. After that, it is safe to delete local copy of the files.
Note
When using "rclone mount" command to mount your cloud storage to HPCC, there are two things users should be careful:
(1) When running rclone mount, the process runs NOT as the user, instead, it runs as a "root" of the cloud storage. Therefore, user may see the error message like "mount helper error: fusermount: failed to open mountpoint for reading: Permission denied". User could use /tmp space for mount point because that space is accessible for all users. Users should be very careful to open the permission to others for the purpose of using rclone mount.
(2) The "rclone mount" users should unmount it after use using "fusermount -u \<endpoint_dir>". Note that sometimes the endpoint is not unmounted from some nodes due to timeout or some reason, you may see the message like "Transport endpoint is not connected" when accessing the endpoint directory on the node. Just manually unmount it again should resolve the issue.
Note
When using "rclone config" command to configure your cloud storage on HPCC, the command will guide you through an interactive setup process. At the step of auto config, after you chose "y", it will start authentication. You will see something like:
1 2 3 4 |
|
At this time, a Firefox browser should be opened. If you did not get the browser window, check if you used -X option to allow X11 forwarding when you run ssh. You may follow the instructions at Connect to HPCC System to get the display right.
It will take a few minutes to get the browser open and connected. Please be patient. If the browser window is open but does not open the authentication page, you could manually input the link provided by the "rclone config" command to the firefox browser's url address box to connect to the site. DO NOT use the link on your personal computer's browser. The authentication have to use the browser on HPCC development node.