/
Transfer Your Data

Transfer Your Data


This page:

Some pointers about data transfer


  • But for larger data transfers, interactive use of scp is discouraged. It is always preferred to perform large data transfers with tools that can resume transfers if the process/connection fails (for example rsync or filezilla).
  • For a large number of files it is always better to tar them into a single file before transfer
  • For transfer processes that may occur routinely, the use of scripts with command-line tools for automating the processes is a great option. WinSCP also allows scripting.

From Your Local Computer


Simple transfers with scp

For small file transfers from your local computer, use the secure copy protocol (scp):

  1. Open a terminal on your local computer
  2. Use the following command

    >scp -i /path/to/your/file ~/.ssh/YOUR_KEYPAIR.pem login_name@###.##.##.##:/path/of/your/nimbus/directory
    • Replace /path/to/your/file to the path of the file that you wish to transfer.
    • Replace YOUR_KEYPAIR.pem with the name of your keypair.
    • Replace login_name@###.##.##.## with your username@ip-address.

      Imagelogin_name
      Ubuntuubuntu
      Centoscentos
      Fedorafedora
      Scientific Linuxroot
      Debiandebian
    • Replace /path/of/your/nimbus/directory with the directory on your instance that you wish to transfer the file to.


Transfers with Filezilla

Filezilla is a fast and reliable file transfer client with an intuitive GUI. It works on multiple platforms (Windows, macOS and Linux) and supports SFTP, which is one of the supported protocols for transfering files to/from Pawsey systems. Filezilla supports simultaneous transfer of multiple files, transfer of large files (>4GB) and transfer resume after connection failure. It allows you to easily transfer files back to your local machine. 

Avoid spyware

Only download Filezilla from its own website: https://filezilla-project.org/index.php . As for any software, be careful of not falling into "click tricks" that mislead you to download or install undesired software.

Basic setup

  • Step 1: In the File menu, select Site Manager. A window for the "Site Manager" settings will pop-up.
  • Step 2: In the lower left corner of the window, select "New Site". Name the site as you want.
  • Step 3: In the Protocol entry, select SFTP as the protocol. In the Host entry, type your Nimbus instance floating IP (e.g. 146.118.64.243)
  • Step 4: In the Logon Type as "Key file".
  • Step 5: In the User box type "ubuntu"
  • Step 6: In the Key File box, type the path and name of your nimbus RSA key (e.g. ~/.ssh/nimbus_key.rsa)
  • Step 7: Click Connect

Do not save your password in Filezilla

Even if Filezilla can remember passwords we do not recommend this practice. The use of ssh-keys is safer. It is also less prone to multiple failed connection attempts, which may eventually block your account at Pawsey (see Account Blocking).


IRDS (UWA) and Acacia- A New Alternative to RClone


UWA did an update in 2022 which means the old method of using RClone with IRDS no longer works. If you are having issues with connection speed, we suggest that using Acacia might be a workaround. In short, Acacia is an S3 object storage system that can be access from anywhere. For example, you might find it useful to setup Acacia on your UWA computer, where you also have IRDS access. Then, rather than transferring large files directly from IRDS to Nimbus or Setonix, you can transfer to Acacia instead. Once the files are on Acacia, downloading them to Nimbus/Setonix is very fast. The downside of this method is that you need to leave your computer running to transfer from IRDS to Acacia. 

We have created documentation on using Acacia, and YouTube training videos. As always, if you have further questions or need support, get in touch with our Helpdesk help@pawsey.org.au

From IRDS (UWA) with rclone (ARCHIVED)


In many cases, you will be able to transfer from IRDS to your Nimbus instance by using a program such as FileZilla or CyberDuck. However, this will use your local internet bandwidth, which may be an issue if you are not at a University/Institute site and you need to transfer large files. Additionally, if you are on a laptop, moving away from an internet connection may disrupt your upload. An alternative is setting up the transfer directly on Nimbus via 'screen' (https://linuxize.com/post/how-to-use-linux-screen/). Then, you can close your laptop and not worry about your download, or bandwidth.

However, it is important to note that IRDS will not allow transfer of files with unusual extensions, such as .bam, .bai, .md5, and others. One way around this is to zip/tar your files before transfer. It is also possible to change the extension to an acceptable format for transfer (i.e. change .bam to .bam.txt).

Rclone is a command line transfer client with webdav capabilities. By setting up rclone you can improve the ease and speed of your transfer from IRDS. 

Step-by-step guide (once logged in to your Nimbus instance)

Part 1. Download rclone

curl https://rclone.org/install.sh | sudo bash


Part 2. Add IRDS to rclone

You will enter the rclone settings by typing `rclone config`. The program then guides you through the configuration process as shown below. Enter the answers as per this guide to add your IRDS setup to rclone. This looks complicated but you will see it's quite simple as you move through it. More info can be found here: https://rclone.org/webdav/

Rclone configuration
rclone config

**********************************
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> irds [or whatever you want]
Type of storage to configure.
Choose a number from below, or type in your own value
[snip]
XX / Webdav
   \ "webdav"
[snip]
Storage> webdav [or 34 if there are number options]
URL of http host to connect to
Choose a number from below, or type in your own value
1 / Connect to example.com
   \ "https://example.com"
url> https://unidrive.uwa.edu.au/staff/irds/PERKINS-AA-001/ [swap out to your IRDS name]
 
Name of the Webdav site/service/software you are using
Choose a number from below, or type in your own value
1 / Nextcloud
   \ "nextcloud"
2 / Owncloud
   \ "owncloud"
3 / Sharepoint
   \ "sharepoint"
4 / Other site/service or software
   \ "other"
vendor> 1
User name
user> phemeID#
Password.
y) Yes type in my own password
g) Generate random password
n) No leave this optional password blank
y/g/n> y
Enter the password:
password: [current pheme password, doesn’t matter about special characters]
Confirm the password:
password:
Bearer token instead of user/pass (eg a Macaroon)
bearer_token>
[Just leave this blank by hitting enter, and then say do then same when asked about advanced options]
--------------------
[remote]
type = webdav
url = https://example.com/remote.php/webdav/
vendor = nextcloud
user = user
pass = *** ENCRYPTED ***
bearer_token =
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y


Once the config is done, you can test that your setup has worked by typing 

rclone lsd irds:

This should show you all the files in your IRDS directory. 


Part 3. Copy files to your instance

The copy process is quite straightforward. In the following example, we will change into the nimbus directory where we want to place the data. Then we give the rclone command, and tell it to transfer for the current working directory through use of the dot. 

cd /dir/for/data/
rclone copy irds:path/to/my/irds/file .


More information on the full usage of rclone can be found here: https://rclone.org/commands/ 

Copying data back to IRDS

If you would like to use rclone to transfer data back to IRDS, you might find there is an issue where large files are not transferrable. Rclone provides a function called chunker, which transparently breaks large files into chunks for transfer, then reassmebles them at the other end. The overall documentation can be found here.

No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> overlay
Type of storage to configure.
Choose a number from below, or type in your own value
[snip]
XX / Transparently chunk/split large files
   \ "chunker"
[snip]
Storage> chunker
Remote to chunk/unchunk.
Normally should contain a ':' and a path, eg "myremote:path/to/dir",
"myremote:bucket" or maybe "myremote:" (not recommended).
Enter a string value. Press Enter for the default ("").
remote> remote:path [e.g. irds] 
Files larger than chunk size will be split in chunks.
Enter a size with suffix k,M,G,T. Press Enter for the default ("2G").
chunk_size> 100M
Choose how chunker handles hash sums. All modes but "none" require metadata.
Enter a string value. Press Enter for the default ("md5").
Choose a number from below, or type in your own value
 1 / Pass any hash supported by wrapped remote for non-chunked files, return nothing otherwise
   \ "none"
 2 / MD5 for composite files
   \ "md5"
 3 / SHA1 for composite files
   \ "sha1"
 4 / MD5 for all files
   \ "md5all"
 5 / SHA1 for all files
   \ "sha1all"
 6 / Copying a file to chunker will request MD5 from the source falling back to SHA1 if unsupported
   \ "md5quick"
 7 / Similar to "md5quick" but prefers SHA1 over MD5
   \ "sha1quick"
hash_type> md5
Edit advanced config? (y/n)
y) Yes
n) No
y/n> n
Remote config
--------------------
[overlay]
type = chunker
remote = remote:bucket
chunk_size = 100M
hash_type = md5
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y


Then you should be able to transfer data back to IRDS in small chunks using something like the following command:

rclone copy sample.fastq.gz chunker:


From IRDS (UWA) with wget (archived)


The following data transfer steps can be be used for automated batch transfers of your data, parallelising using up to the number of CPUs on your VM. IRDS may close the connection after several hours, so it is important to check your log files to see if you transfer completed successfully. If you use the -c flag as specified below, you will be able to resume failed transfer by simply repeating your transfer script.

Step-by-step guide (once logged in to your Nimbus instance)

Part 1. Set up wget

You will need to make a ~/.wgetrc file for your user name and password.

##Change into your home directory
cd

##make ~/.wgetrc file
touch .wgetrc 

##change the permissions on the file so only you can view and edit
chmod 600 .wgetrc 

##open .wgetrc for editing
nano .wgetrc 

#Add the following two lines with your credentials, then save the file
--http-user=UWA staff number
--http-passwd=Pheme password

A note on passwords

This does not apply to passwords in your ~./wgetrc file:

UWA enforces use of 'special characters' in passwords for security reasons. Special characters generally refer to the following [ \ ^ $ . | ? * + ( ). These characters can cause issues in the commandline environment, making your password invalid because it's not been read correctly. Let's say your password is Ph3m3Password!  To make this be evaluated correctly, you would need to 'escape' the special character (in this case an exclamation mark) by placing a backslash before the exclamation mark e.g. Ph3m3Password\!

If your password has multiple special characters, be sure to place the backslash before each special character.


Part 2. Set up your transfer scripts

  1. In the directory where you would like to download the files at on Nimbus, make a .txt file with the full paths of the files you would like to transfer (eg. https://unidrive.uwa.edu.au/staff//irds/PATH/TO/YOUR/FILES), with one file path per line. Alternatively, compress the files you would like to transfer into one zip/tar.gz file and use that path instead. Note: the double forward slashes between staff and irds (i.e. staff//irds) are deliberate.

  2. Create a batch script to perform the transfer. A semi-automated example is provided below
  3. Before running your script, make sure to open a screen session by typing 'screen' into your VM. To detach from a screen session, use the keys 'ctrl + a' and then 'd'. To reconnect to a screen session, use 'screen -r'.


Example batch script:

#!/bin/bash

#editable variables. Change these to suit ypur needs and pawsey account
input_file_list="test_input.txt"
number_of_CPUs="8"

###Add slurm parameters to the job script
echo "#!/bin/bash" >> irds_trf.sh.tmp

#split the input sample list into 16, without breakiung up any text lines.
#Gives prefix of "input-" to the output split files
split -n l/${number_of_CPUs} ${input_file_list} input-

#add the input files to the job script
for file in $(ls input-*)
do
echo "wget -r -np -N -c -i $file &" >> irds_trf.sh.tmp
done

#remove the "&" character from the very last line of the job script
sed '$s/&$//' irds_trf.sh.tmp >> irds_trf.sh

#remove temp file
rm irds_trf.sh.tmp

#start job script
bash irds_trf.sh 2>&1


The -c flag allows easy resuming of failed transfers if they fail. The -i flag reads the list of files you made in Part 2, Step 1. Your user name and password should be read automatically from the .wgetrc file you made earlier.

Note that IRDS tends to disconnect users after several hours, so don't bother setting a long wall time as you will almost certainly get disconnected before that. You will need to check your log files when your job completes to know if the transfer failed at any point. If it did fail, simply restart the script and it will continue where it left off. Make sure to delete the sbatch_irds_trf.sh file before restarting your job.


The log files tend to be very long, so you are best to use tail rather than cat or zless.


Between Nimbus Projects


If you are a member of multiple Nimbus projects, you may want to share data between those projects. For one-off transfers of data, the simplest option is to transfer a storage volume from one project to the other. This option will move (not copy) an existing storage volume from one project to another, using a shared Transfer ID and Authorization Key. If you wish to use this option, please note the following:

  • The steps given below are done through the Nimbus dashboard; if you prefer to perform the transfer from the command-line using the openstack client, that can be found here: https://docs.openstack.org/cinder/latest/cli/cli-manage-volumes.html#transfer-a-volume
  • You do not need to be a member of both the source and destination project in order to transfer a storage volume between them. Once you have initiated the transfer from the source project, you can give the Transfer ID and Authorization Key to a member of the destination project to complete the transfer.

Source Project - Initiate the Transfer

  1. Log in to the Nimbus dashboard (of your source project), and go to the "Volumes" section.
  2. If you wish to make a copy of the volume in the source project, do so before you begin the transfer.
    1. You will need to create a snapshot of the volume first, then create a separate volume from that snapshot (be sure to delete the snapshot once you have created the volume copy from it).
    2. Make sure that the status of the volume you wish to transfer is "Available" - i.e. it is not attached to another instance or in use by someone.
    3. Also, check the "Snapshots" page of the dashboard to confirm that there are no snapshots associated with the volume.
  3. Click on the drop-down menu to the right hand side of the volume you wish to transfer, and select "Create Transfer".
    1. This will pop up a prompt asking for a "Transfer Name". Give it a simple name (preferably without any spaces in the name), and click on "Create Volume Transfer".
    2. Another window will pop up, giving you a "Transfer ID" and an "Authorization Key". Make a copy of both of these values somewhere (or click on the "Download transfer credentials" button below it to save those details to a text file).
  4. Once you have done that, the status of the volume will change to "awaiting-transfer" in the dashboard. At this point, if you decide you don't want to proceed with the transfer, you can click on "Cancel Transfer" to the right of the volume to cancel the process.

Destination Project - Complete the Transfer

  1. When you are ready to proceed, click on the down-arrow next to the project name in the top-left corner of the dashboard, to give you a list of all of the Nimbus projects you are a member of. Select the project from the drop-down list that you wish to change to.
  2. From the "Volumes" page in the destination project, click on "Accept Transfer" on the right hand side of the dashboard. This will pop up an "Accept Volume Transfer" window.
  3. Enter the "Transfer ID" and "Authorization Key" that you copied down from previously, then click on "Accept Volume Transfer". The volume transfer should be pretty much instantaneous, as it is essentially a change of ownership of the volume, rather than copying the volume from one project to another.