Skip to content

Rclone - rsync for cloud storage

Rclone can be used to copy files from/to their Microsoft OneDrive or Google Drive cloud storage to/from HPCC disk space. This tool can also be used to mount a user's cloud storage to their HPCC disk so that the storage on cloud could be used as extended disk space.

Rclone is installed on HPCC system wide. To use it, users should first  load the software module into their environment using command:

1
module load Rclone 

For more details of using rclone, users can visit Rclone web site at https://rclone.org/.

To start using Rclone, users need to run the following command to configure it:

1
rclone config

The instructions for this command could be found at https://rclone.org/commands/rclone_config/.

Specifically, to configure for Google Drive, see https://rclone.org/drive/, and to configure for Microsoft Onedrive, see https://rclone.org/onedrive/ for instructions. The specific details of how to start using this software on HPCC could be found in the document Rclone.pdf

After successfully configuring the software, users should be able to use "rclone" command to copy or mount the cloud storage to HPCC. There are many rclone sub-commands that can be used to handle file transfers and manage files on HPCC and cloud storage. To get help, use "rclone --help" as shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
[hpc@dev-intel16-k80 ~]$ module load Rclone
[hpc@dev-intel16-k80 ~]$ rclone --help

Rclone syncs files to and from cloud storage providers as well as
mounting them, listing them in lots of different ways.

See the home page (https://rclone.org/) for installation, usage,
documentation, changelog and configuration walkthroughs.

Usage:
  rclone [flags]
  rclone [command]

Available Commands:
  about           Get quota information from the remote.
  authorize       Remote authorization.
  cachestats      Print cache stats for a remote
  cat             Concatenates any files and sends them to stdout.
  check           Checks the files in the source and destination match.
  cleanup         Clean up the remote if possible
  config          Enter an interactive configuration session.
  copy            Copy files from source to dest, skipping already copied
  copyto          Copy files from source to dest, skipping already copied
  copyurl         Copy url content to dest.
  cryptcheck      Cryptcheck checks the integrity of a crypted remote.
  cryptdecode     Cryptdecode returns unencrypted file names.
  dbhashsum       Produces a Dropbox hash file for all the objects in the path.
  dedupe          Interactively find duplicate files and delete/rename them.
  delete          Remove the contents of path.
  deletefile      Remove a single file from remote.
  genautocomplete Output completion script for a given shell.
  gendocs         Output markdown docs for rclone to the directory supplied.
  hashsum         Produces an hashsum file for all the objects in the path.
  help            Show help for rclone commands, flags and backends.
  link            Generate public link to file/folder.
  listremotes     List all the remotes in the config file.
  ls              List the objects in the path with size and path.
  lsd             List all directories/containers/buckets in the path.
  lsf             List directories and objects in remote:path formatted for parsing
  lsjson          List directories and objects in the path in JSON format.
  lsl             List the objects in path with modification time, size and path.
  md5sum          Produces an md5sum file for all the objects in the path.
  mkdir           Make the path if it does not already exist.
  mount           Mount the remote as file system on a mountpoint.
  move            Move files from source to dest.
  moveto          Move file or directory from source to dest.
  ncdu            Explore a remote with a text based user interface.
  obscure         Obscure password for use in the rclone.conf
  purge           Remove the path and all of its contents.
  rc              Run a command against a running rclone.
  rcat            Copies standard input to file on remote.
  rcd             Run rclone listening to remote control commands only.
  rmdir           Remove the path if empty.
  rmdirs          Remove empty directories under the path.
  serve           Serve a remote over a protocol.
  settier         Changes storage class/tier of objects in remote.
  sha1sum         Produces an sha1sum file for all the objects in the path.
  size            Prints the total size and number of objects in remote:path.
  sync            Make source and dest identical, modifying destination only.
  touch           Create new file or change file modification time.
  tree            List the contents of the remote in a tree like fashion.
  version         Show the version number.

Use "rclone [command] --help" for more information about a command.
Use "rclone help flags" for to see the global flags.
Use "rclone help backends" for a list of supported services.
[hpc@dev-intel16-k80 ~]$ 

The tool "cloudSync" was developed to help user to synchronize the files between their cloud storages. It is accessible through "powertools" which should automatically loaded upon logging into HPCC, but can be manually loaded with 'ml load powertools' if need be. Users are welcome to try it and report any problems to us via contact form here.

Following are a few examples of running rclone commands after successfully having configured the cloud storage. Assume that the cloud storage is configured as  the name "MyOneDrive". 

(1) See current remote storage

We can check the current configuration of rclone using 'rclone config'. As is shown below, we can see that there are currently two remote cloud storage configured:  "MyOneDrive" and "googledoc" 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
[user@dev-intel18 ~]$ rclone config
Current remotes:

Name                 Type
====                 ====
MyOneDrive           onedrive
googledoc            drive

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

[user@dev-intel18 ~]$

(2) Check the remote storage information

We can see the remote storage usage and quota using "rclone about" command.

1
2
3
4
5
[user@dev-intel16-k80 ~]$ rclone about MyOneDrive:
Total:   5T
Used:    450.999M
Free:    4.998T
Trashed: 404.576k

(3) List the contents of the cloud storage

1
2
3
4
5
6
[user@dev-intel18 ~]$ rclone lsd MyOneDrive:
          -1 2018-02-02 08:57:54         0 Attachments
          -1 2019-08-27 15:43:33         1 IMAGES
          -1 2019-08-22 15:50:10        42 Matlab
          -1 2019-02-26 17:12:01        16 Microsoft Teams Chat Files
          -1 2018-08-24 08:56:32         1 Notebooks

(4) Copy files on HPCC to remote cloud:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[user@dev-intel18 ~]$ rclone copy Project MyOneDrive:Project   # copy the content of directory "Project" to remote cloud storage

[user@dev-intel18 ~]$ rclone lsd MyOneDrive:               # view the contents of cloud storage to confirm the copy
          -1 2018-02-02 08:57:54         0 Attachments
          -1 2019-08-27 15:43:33         1 IMPACT
          -1 2019-08-22 15:50:10        42 Matlab
          -1 2019-02-26 17:12:01        16 Microsoft Teams Chat Files
          -1 2018-08-24 08:56:32         1 Notebooks
          -1 2020-04-27 15:43:25         2 Project
[user@dev-intel18 ~]$ rclone lsd MyOneDrive:Project
          -1 2020-04-27 15:44:39         1 GPAW
          -1 2020-04-27 15:43:26         3 MATLAB

(5) Copy files on cloud storage to HPCC:

1
2
3
4
5
[user@dev-intel18 Project]$ ls                                 # current content of Project directory before copy
GPAW  MATLAB
[user@dev-intel18 Project]$ rclone copy MyOneDrive:IMPACT ./   # copy the content of IMPACT in cloud to current directory         
[user@dev-intel18 Project]$ ls                                 # confirm that the copy is done
GPAW  impact_run  MATLAB

Note

Although "rclone copy" is similar as unix commands rsync and cp, when using it, users should be aware of the differences and know the details of its behavior. 

(1) "rclone copy" does not transfer unchanged files, testing by size and modification time or MD5SUM. In this sense, it is similar as linux command rsync;

(2) When running "rclone copy source:sourcepath dest:destpath", if source:sourcepath is a directory, dest:destpath should also be a directory.  It does not copy the directory source:sourcepath, instead, it will copy the content of the directory source:sourcepath to the destination dest:destpath. If dest:destpath does not exist, it will be created and the content of source:sourcepath will be stored in it.

(3) "rclone copyto" is a very similar rclone command to "rclone copy". The only difference is that it can be used to upload single files to files other than their current name. When running "rclone copyto source:sourcepath dest:destpath", if source:sourcepath is a file, dest:destpath could be a new file name. If source:sourcepath is a directory, it would be the same as using "rclone copy".

(6) Checks the files in the source and destination match.

1
2
3
[user@dev-intel18 Project]$ rclone check impact_run MyOneDrive:IMPACT/impact_run   # check if it is matched both sides
2020/04/27 16:19:01 NOTICE: One drive root 'IMPACT/impact_run': 0 differences found
2020/04/27 16:19:01 NOTICE: One drive root 'IMPACT/impact_run': 21 matching files

Note

For archiving your files to your cloud storage, if the connection between HPCC and your cloud storage is not stable, we would NOT recommend using "rclone move" because it may loss the data during the transfer. Instead, we recommend using "rclone copy" to successfully copy the files over and run "rclone check" to check if files are identical. After that, it is safe to delete local copy of the files.

Note

When using "rclone mount" command to mount your cloud storage to HPCC, there are two things users should be careful:

(1) When running rclone mount, the process runs NOT as the user, instead, it runs as a "root" of the cloud storage. Therefore, user may see the error message like "mount helper error: fusermount: failed to open mountpoint for reading: Permission denied". User could use /tmp space for mount point because that space is accessible for all users. Users should be very careful to open the permission to others for the purpose of using rclone mount. 

(2) The "rclone mount" users should unmount it after use using "fusermount -u \<endpoint_dir>". Note that sometimes the endpoint is not unmounted from some nodes due to timeout or some reason, you may see the message like "Transport endpoint is not connected" when accessing the endpoint directory on the node. Just manually unmount it again should resolve the issue. 

Note

When using "rclone config" command to configure your cloud storage on HPCC, the command will guide you through an interactive setup process. At the step of auto config, after you chose "y", it will start authentication. You will see something like:

1
2
3
4
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth

Log in and authorize rclone for access
Waiting for code...

At this time, a Firefox browser should be opened. If you did not get the browser window, check if you used -X option to allow X11 forwarding when you run ssh.  You may follow the instructions at Connect to HPCC System to get the display right.

It will take a few minutes to get the browser open and connected. Please be patient. If the browser window is open but does not open the authentication page, you could manually input the link provided by the "rclone config" command to the firefox browser's url address box to connect to the site. DO NOT use the link on your personal computer's browser. The authentication have to use the browser on HPCC development node.