Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Column


Note
titleUse ssh-keys with command-line clients

Pawsey strongly recommends the use of ssh-keys instead of the conventional and less secure method of typing your username and password. For a description of how to set up ssh-keys, see: Logging in with Use of SSH Keys for Authentication.



Ui tabs


Ui tab
titleSCP

SCP (Secure Copy Protocol)

The Secure Copy Protocol, or SCP, is a file transfer network protocol used to move files onto servers, and it fully supports encryption and authentication. SCP uses Secure Shell (SSH) mechanisms for data transfer and authentication to ensure the confidentiality of the data in transit.



SCP is useful to copy few small files and not a lot or very large files. It is not recommended for the transfer of large amounts of data, as it can't resume transfers if the operation/connection is interrupted for any reason.


Syntax

$ scp [options] SourceFileOrDir USER@HOST:DestFileOrDir

$ scp [options] USER@HOST:SourceFileOrDir DestFileOrDir

Here, the name of the "SourceFileOrDir" includes both the path and name of the files/directories to transfer. The "DestFileOrDir" (destination) also includes the path (and possibly the "new" name) of the destination file or directory. "USER" is the username to connect, and "HOST" is the hostname or ip address of the remote computer to connect to. Note the use of the colon (:) separating the HOST and the definition of the path/files on it.

Examples

To transfer the file "/myDir/initialConditions.tar.gz" from a local computer into your personal directory in "/scratch", where "[username]" and "[project]" is your username and project respectively:

900px


bashDJangoTerminal 1. Example of transferring a file using scp


To transfer a whole directory tree recursively, then the option "-r" needs to be used. For example, to backup the whole directory where a user may keep their useful scripts "/software/pawsey9999/mickey/myScripts" into user's own computer local directory "./PawseyStuff":

900px


bashDJangoTerminal 2. Example of transferring a whole directory recursively


External references

Documentation about the general use of SCP can be found elsewhere.


Ui tab
titleRsync

Rsync

Rsync is a utility for efficiently transferring and synchronizing files across computer systems, by checking the timestamp and size of files.


Rsync is more robust than SCP in the sense that it can resume transfers after failure, and it can also be used for synchronising and backing up.
Pawsey recommend its use within scripts that can be reused for transferring data.


Syntax

$ rsync [options] SourceFileOrDir USER@HOST:DestFileOrDir

$ rsync [options] USER@HOST:SourceFileOrDir DestFileOrDir

Here, the name of the "SourceFileOrDir" includes both the path and name of the files/directories to transfer. The "DestFileOrDir" (destination) also includes the path (and possibly the "new" name) of the destination file or directory. "USER" is the username to connect, and "HOST" is the hostname or ip address of the remote computer to connect to. Note the use of the colon (:) separating the HOST and the definition of the path/files on it.

Rsync has a lot of different options that are documented elsewhere. Pawsey recommends to use the following options (to be included in the above syntax):

-vhsrl --chmod=Dg+s -e ssh

The purpose of the recommended options are as follows:

OptionPurpose
-e sshRun Rsync over SSH.
--chmod=Dg+sForce all directories to get marked by the default setgid.
-v(verbose) Display messages about the progress of the transfer.
-hUse human-readable numbers for the sizes of the transferred data displayed by the "verbose" option.
-s

This allows for strange names of files (with spaces or special characters) to be interpreted as part of the name.


Pawsey does not recommend the use of spaces or special characters in filenames.



-rCopy directories recursively, so that if a directory has been chosen to be transferred, all the contents of the directory will be transferred.
-lWhen symlinks are encountered, recreate the symlink on the destination.

Table 1. Pawsey's recommended options.

Do not use -a option for transferring files into our systems

Contrary to the common recommendations available on the internet, Pawsey does not recommend the use the "-a" option, especially for transferring files into our systems. This is because the behaviour of "-a" can override the default setgid settings of the "group" property (see "problems with ownership" below).


Use preservation of times with care

Rsync has an option, -t, that activates the preservation of modification times of the files into the destination system. This can be useful for Rsync choosing not to transfer files that already exist in the destination system with the same modification dates.  However, this should not be used when transferring files to the /scratch filesystem, where a 30-day purge policy is in place. If files have not been accessed for more than 30 days in your own system and then you transfer them into /scratch using the -t  option, this will result in the inadvertent and almost immediate deletion of the recently copied files and then data loss. Deletion will happen because of the 30 days purge policy will identify those files as old.

Examples

For transferring the local directory in your host computer "~/initialConditions" and all its contents into your personal directory on /scratch, where "[username]" and "[project]" is your username and project respectively:

900px


bashDJangoTerminal 3. Example transferring a whole directory


External references

Documentation about the general use of Rsync can be found elsewhere.


Ui tab
titleSFTP

SFTP (Secure File Transfer Protocol)

SFTP is an interactive transfer client that uses SSH to create a secure connection to the server. Its functionality is very similar to FTP, but not all of the FTP options are available. (Not be confused with FTPS (FTP over SSL)).

Open and Close a connection to Pawsey

The following sftp command can be used to establish a connection from your local machine to the Pawsey data-mover nodes:

900px


bashDJangoTerminal 4. SFTP connection to the data-mover nodes


Note that after establishing a connection, the prompt will change to "sftp>" indicating that the interactive SFTP session has started.

To close the connection, execute the following:

900px


bashDJangoTerminal 5. Exit the interactive SFTP session


Navigation in the local and remote systems

As for any Linux interactive session, the basic navigation tool for navigating directories in the remote filesystem is cd:

900px


bashDJangoTerminal 6. Change the current directory from the remote server


In addition, you can check the current directory on the remote server with pwd and ls:

900px


bashDJangoTerminal 7. Check the current directory on the remote server


Within the SFTP interactive session, you can also navigate in your local computer by using the prefix "l" for "local" in the commands:

900px


bashDJangoTerminal 8. Navigate the local filesystem


It can be tricky to remember to use lcd, lls and lpwd to navigate in the local system. Therefore, it is recommended to navigate in your local system before establishing the SFTP connection. This way, your current directory in the local computer will be the desired local directory for file transfers and you may not need to navigate from there anymore.


Copy files to Pawsey filesystems

Once an interactive SFTP connection to the data-mover system at Pawsey has been established, and the current directory in the local and remote systems are the desired ones, users can put files into the remote system by executing the put command:

put [options] SourceFileOrDirInLocalSystem [DestFileOrDirInRemoteSystem]

For example:

900px


bashDJangoTerminal 9. Transfer files to the remote filesystem


Or, if you already navigated (as explained above) into the correct path in the local and the remote computer, then a simple put would be enough:

900px


bashDJangoTerminal 10. Transfer myfile.dat


As the general syntax suggests, there is still freedom to choose source path/files and destination path/file names. The following example takes another file from other directory and puts it into your personal directory in /software, even if the current remote directory was in /scratch:

900px


bashDJangoTerminal 11. Tranferring files from different source and destination paths


In order to put an entire directory, the option "-r" needs to be used. In this case, the directory "myScripts" will be transferred to the your personal directory in the /scratch filesystem:

900px


bashDJangoTerminal 12. Transfer of a directory recursively


Copy files from Pawsey filesystems

The SFTP command for copying data into the local filesystem is get. Its general syntax is:

get [options] SourceFileOrDirInRemoteSystem [DestFileOrDirInLocalSystem]

External references

Documentation about the general use of SFTP can be found elsewhere.


...

Column


Note
titleUse ssh-keys with GUI transfer tools

Pawsey strongly recommends the use of ssh-keys instead of the conventional and less secure username/password method. In this section we describe how to set up some GUI tools using ssh-keys to authenticate access to Pawsey. For a description of how to set up ssh-keys, see: Logging in with Use of SSH Keys for Authentication.

Also, by avoiding the use of username/password method, you will not need to save this sensitive data within the tool (which is also never recommended). Furthermore, avoiding the use of username/password method will keep you apart of the common problem of blockage of your account when the transfer tool retries to connect to Pawsey with an old or wrong password.


Tip
titleAlways pay attention to the source and destination directories

Most GUI clients will start in your /home directory when you first connect to a remote server, while some will start in your previously accessed directory. This is almost never where you need to put the data for the new session. In most cases you will need to browse to your own /scratch, /home or /software directories.


...

Transfers between Pawsey filesystems and Acacia object storage are performed by tools/clients compatible with the Amazon S3 protocol. In Pawsey clusters, we count with modules of the following S3 compatible clients:

  • mc (minio client):

module load miniocli/<version>


  • rclone:

module load rclone/<version>

...

In-depth description of the different tools/clients for accessing and transferring data in/out Acacia is in the Acacia - User Guide /wiki/spaces/DATA/pages/54459526.

Best practices for your data and data transfers

...

For automating the connections needed within your script, you can use ssh-keys as indicated in "Secure transfers using ssh-keys" section below.

WinSCP also allows scripting.

...

You must generate a key-pair specifically for this purpose, that is, data transfers to and from Pawsey systems (Logging in with Use of SSH Keys for Authentication). Let's call this key-pair: COPYPAIR. Do not repurpose an existing key-pair used to log in to Pawsey or other systems (which by the way, should use a passphrase). This allows isolation of unauthorised unauthorized accesses due to a compromised key-pair.

...

If the problem has expanded extensively into most of your /group directory, you can fix it by using fix.group.permission.sh, which is provided by the module pawseytools. For more information about this tool, see under "File Permissions and Quota" on the Pawsey Filesystems and their UsageUse page.

Finally, in order to avoid this problem to happen again, configure your file transfer program to honour the setgid (set-group identification) default so that newly created files and directories belong to your project on the "group" property. This is explained for several documented tools in the following subsections.

...

SSH is enabled on Pawsey systems for both incoming and outgoing traffic. This, however, may not be true for some firewalls on connections on the client side. Most university, business and home internet connections only permit outgoing connections, and have their incoming SSH disabled within their firewall. This means that, SCP is always invoked on the client, that is, your Laptop/Desktop to copy the data to/from the Pawsey supercomputers.

Related Pages