Skip to content

Rclone - rsync for cloud storage

Users could use this software to copy files from/to their Microsoft OneDrive or Google Drive cloud storage to/from HPCC disk space. This tool could also mount their cloud storage to HPCC disk so that the storage on cloud could be used as extended disk space.

Rclone is installed on HPCC system wide. To use it, users should first  load the software module into their environment using command "module load Rclone". For more details of using rclone, users can visit Rclone web site at https://rclone.org/.

To start using it, user should run command "rclone config" to configure it.  The instructions of this command could be found at https://rclone.org/commands/rclone_config/. Specifically, to configure for Google Drive, see https://rclone.org/drive/, and to configure for Microsoft Onedrive, see https://rclone.org/onedrive/ for instructions. The specific details of how to start using this software on HPCC could be found in the document Rclone.pdf

After successfully configured, users should be able to use "rclone" command to copy or mount the cloud storage to HPCC. There many rclone commands could be used to handle the file transfer and manage files on HPCC and cloud storage. To get help, use "rclone --help" as show in the following

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
[hpc@dev-intel16-k80 ~]$ module load Rclone
[hpc@dev-intel16-k80 ~]$ rclone --help

Rclone syncs files to and from cloud storage providers as well as
mounting them, listing them in lots of different ways.

See the home page (https://rclone.org/) for installation, usage,
documentation, changelog and configuration walkthroughs.

Usage:
  rclone [flags]
  rclone [command]

Available Commands:
  about           Get quota information from the remote.
  authorize       Remote authorization.
  cachestats      Print cache stats for a remote
  cat             Concatenates any files and sends them to stdout.
  check           Checks the files in the source and destination match.
  cleanup         Clean up the remote if possible
  config          Enter an interactive configuration session.
  copy            Copy files from source to dest, skipping already copied
  copyto          Copy files from source to dest, skipping already copied
  copyurl         Copy url content to dest.
  cryptcheck      Cryptcheck checks the integrity of a crypted remote.
  cryptdecode     Cryptdecode returns unencrypted file names.
  dbhashsum       Produces a Dropbox hash file for all the objects in the path.
  dedupe          Interactively find duplicate files and delete/rename them.
  delete          Remove the contents of path.
  deletefile      Remove a single file from remote.
  genautocomplete Output completion script for a given shell.
  gendocs         Output markdown docs for rclone to the directory supplied.
  hashsum         Produces an hashsum file for all the objects in the path.
  help            Show help for rclone commands, flags and backends.
  link            Generate public link to file/folder.
  listremotes     List all the remotes in the config file.
  ls              List the objects in the path with size and path.
  lsd             List all directories/containers/buckets in the path.
  lsf             List directories and objects in remote:path formatted for parsing
  lsjson          List directories and objects in the path in JSON format.
  lsl             List the objects in path with modification time, size and path.
  md5sum          Produces an md5sum file for all the objects in the path.
  mkdir           Make the path if it does not already exist.
  mount           Mount the remote as file system on a mountpoint.
  move            Move files from source to dest.
  moveto          Move file or directory from source to dest.
  ncdu            Explore a remote with a text based user interface.
  obscure         Obscure password for use in the rclone.conf
  purge           Remove the path and all of its contents.
  rc              Run a command against a running rclone.
  rcat            Copies standard input to file on remote.
  rcd             Run rclone listening to remote control commands only.
  rmdir           Remove the path if empty.
  rmdirs          Remove empty directories under the path.
  serve           Serve a remote over a protocol.
  settier         Changes storage class/tier of objects in remote.
  sha1sum         Produces an sha1sum file for all the objects in the path.
  size            Prints the total size and number of objects in remote:path.
  sync            Make source and dest identical, modifying destination only.
  touch           Create new file or change file modification time.
  tree            List the contents of the remote in a tree like fashion.
  version         Show the version number.

Use "rclone [command] --help" for more information about a command.
Use "rclone help flags" for to see the global flags.
Use "rclone help backends" for a list of supported services.
[hpc@dev-intel16-k80 ~]$ 

A tool "cloudSync" is developed for user to synchronize the files between their cloud storages. It is accessible through "powertools". To use it, users need to have the module "powertools" loaded. Users are welcome to try it and report any problems to us via contact form here.

Following are a few examples of running rclone commands after successfully configured the cloud storage. Assume that the cloud storage is configured as  the name "MyOneDrive". 

(1) see current remote storage: As show, we can see that there are currently two remote cloud storage,  "MyOneDrive" and "googledoc" are configured. 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
[user@dev-intel18 ~]$ rclone config
Current remotes:

Name                 Type
====                 ====
MyOneDrive           onedrive
googledoc            drive

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

[user@dev-intel18 ~]$

(2) Check the remote storage information: We could check the remote storage usage and quota using "rclone about" command.

1
2
3
4
5
[user@dev-intel16-k80 ~]$ rclone about MyOneDrive:
Total:   5T
Used:    450.999M
Free:    4.998T
Trashed: 404.576k

(3) List the contents of the cloud storage

1
2
3
4
5
6
[user@dev-intel18 ~]$ rclone lsd MyOneDrive:
          -1 2018-02-02 08:57:54         0 Attachments
          -1 2019-08-27 15:43:33         1 IMAGES
          -1 2019-08-22 15:50:10        42 Matlab
          -1 2019-02-26 17:12:01        16 Microsoft Teams Chat Files
          -1 2018-08-24 08:56:32         1 Notebooks

(4) Copy files on HPCC to remote cloud:  

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[user@dev-intel18 ~]$ rclone copy Project MyOneDrive:Project   # copy the content of directory "Project" to remote cloud storage

[user@dev-intel18 ~]$ rclone lsd MyOneDrive:               # view the contents of cloud storage to confirm the copy
          -1 2018-02-02 08:57:54         0 Attachments
          -1 2019-08-27 15:43:33         1 IMPACT
          -1 2019-08-22 15:50:10        42 Matlab
          -1 2019-02-26 17:12:01        16 Microsoft Teams Chat Files
          -1 2018-08-24 08:56:32         1 Notebooks
          -1 2020-04-27 15:43:25         2 Project
[user@dev-intel18 ~]$ rclone lsd MyOneDrive:Project
          -1 2020-04-27 15:44:39         1 GPAW
          -1 2020-04-27 15:43:26         3 MATLAB

(5) Copy files on cloud storage to HPCC:

1
2
3
4
5
[user@dev-intel18 Project]$ ls                                 # current content of Project directory before copy
GPAW  MATLAB
[user@dev-intel18 Project]$ rclone copy MyOneDrive:IMPACT ./   # copy the content of IMPACT in cloud to current directory         
[user@dev-intel18 Project]$ ls                                 # confirm that the copy is done
GPAW  impact_run  MATLAB

Note

Although "rclone copy" is similar as unix commands rsync and cp, when using it, users should be aware of the differences and know the details of its behavior. 

(1) "rclone copy" does not transfer unchanged files, testing by size and modification time or MD5SUM. In this sense, it is similar as linux command rsync;

(2) When running "rclone copy source:sourcepath dest:destpath", if source:sourcepath is a directory, dest:destpathshould also be a directory.  It does not copy the directory source:sourcepath, instead, it will copy the content of the directory source:sourcepath to the destination dest:destpath. If dest:destpathdoes not exist, it will be created and the content of source:sourcepathwill be stored in it.

(3) "rclone copyto" is a very similar rclone command to "rclone copy". The only difference is that it can be used to upload single files to other than their current name. When running "rclone copyto source:sourcepath dest:destpath", if source:sourcepath is a file, dest:destpath could be a new file name. If source:sourcepath is a directory, it would be the same as using "rclone copy".

(6) Checks the files in the source and destination match.

1
2
3
[user@dev-intel18 Project]$ rclone check impact_run MyOneDrive:IMPACT/impact_run   # check if it is matched both sides
2020/04/27 16:19:01 NOTICE: One drive root 'IMPACT/impact_run': 0 differences found
2020/04/27 16:19:01 NOTICE: One drive root 'IMPACT/impact_run': 21 matching files

Note

For archiving your files to your cloud storage, if the connection between HPCC and your cloud storage is not stable, we would NOT recommend using "rclone move" because it may loss the data during the transfer. Instead, we recommend using "rclone copy" to successfully copy the files over and run "rclone check" to check if files are identical. After that, it is safe to delete local copy of the files.

Note

When use "rclone mount" command to mount your cloud storage to HPCC, there are two things users should be careful:

(1) When running rclone mount, the process runs NOT as the user, instead, it runs as a "root" of the cloud storage. Therefore, user may see the error message like "mount helper error: fusermount: failed to open mountpoint for reading: Permission denied". User could use /tmp space for mount point because that space is accessible for all users. Users should be very careful to open the permission to others for the purpose of using rclone mount. 

(2) The "rclone mount" users should unmount it after use using "fusermount -u \<endpoint_dir>". Note that sometimes the endpoint is not unmount from some nodes due to timeout or some reason, you may see the message like "Transport endpoint is not connected" when accessing the endpoint directory on the node. Just manually unmount it again should resolve the issue. 

Note

When use "rclone config" command to configure your cloud storage on HPCC, the command will guide you through an interactive setup process. At the step of auto config, after you chose "y", it will start authentication. You will see something like:

1
2
3
4
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth

Log in and authorize rclone for access
Waiting for code...

At this time, a firefox browser would be open. If you did not get the browser window, check if you used -X option to allow X11 forwarding when you run ssh.  You may follow the instruction at [Connect to HPCC System to get the display right.

It will take a few minutes to get the browser open and connected. Please be patient. If the browser window is open but does not open the authentication page, you could manually input the link provided by the "rclone config" command to the firefox browser's url address box to connect to the site. DO NOT use the link on your personal computer's browser. The authentication have to use the browser on HPCC development node.