/
Pawsey Filesystems and their Use

Pawsey Filesystems and their Use


There are multiple filesystems mounted to each of Pawsey's supercomputers. Each of these filesystems are designed for particular use cases. This page provides a detailed description of these filesystems.

On this page:

Overview

The following filesystems are available from one or more Pawsey supercomputing systems:

  • /home - which should be used to store software configuration files that cannot be easily located elsewhere.
  • /software - Lustre filesystem which should contain both Pawsey and researcher software installations and Slurm batch scripts. 
  • /scratch - Lustre filesystem which should contain working data in use by jobs that are actively queued and running on the supercomputer

These filesystems can be viewed using the df command from the login nodes:

$ df -H

Apart from /home, all are Lustre distributed filesystems. Lustre is an open-source, high performance parallel file system optimised for high throughput. 

The filesystems are different in many ways and are designed to facilitate different activities in supercomputing. The intended usage for each of them is explained below. Use outside of these purposes may cause poor performance for a particular activity as well as create detrimental impacts to other users.


Pawsey filesystems are not backed up. Ensure you have a backup of your important files.

Home filesystem

The home filesystem should be used to store software configuration files. It is a Networked FileSystem (NFS). Each user has a default login directory in the /home filesystem with a quota of 1 GB and 10,000 individual files.

/home/[username]

The location of the home directory can be displayed using the $HOME environment variable:

$ echo $HOME

The /home filesystem is intended to be used to store relatively small numbers of important system files such as your Linux profile and shell configuration. It is not suitable for keeping working directories and launching jobs from there. Even less so for storing result files from job executions.

Current usage of the /home filesystem can be viewed by executing the quota command:

$ quota -s

Due to its small quota limit and low performance, the /home filesystem is not suitable for launching or storing production work. Files such as software installations and Slurm batch scripts should be stored on the /software filesystem. Working data, such as job input and output, should be stored on the /scratch file system.

What to do if you exceeded your quota

First thing to do is to identify those directories that contain a large number of files or those files that are too large and are consuming your quota. Then delete them.

Identifying subdirectories with a large number of files

You can use the following command that finds the subdirectories recursively and list them in descending order of containing files. Execute this command from your $HOME directory:

Terminal X. find command to search for the number of containing files in subdirectories
$ cd $HOME
$ find . -type d -exec sh -c 'echo -n "{}: "; find "{}" -type f | wc -l' \; | sort -n -k 2 -r | tee $MYSCRATCH/homeSubdirectoriesRanked.out

Then you can check the file $MYSCRATCH/homeSubdirectoriesRanked.out and decide what subdirectories to remove. Note that the output is written in $MYSCRATCH because you may have not enough quota to write in $HOME.

Identifying large files

Terminal X. find command to search for the number of containing files in subdirectories
$ cd $HOME
$ find . -type f -exec du -h {} + | sort -rh | head -n 10

Then you can decide which files to remove. Note that you can modify the number in last filter (head -n 10) to display more or less results. You could also could have used this last filter (head -n 10) in the previous command to avoid a large output of lines, or you could have used here the same final filters as in the previous command in order to save output into a file for a later careful check.

Hidden files 

Home is often used by a variety of programs to store configuration files and directories along with some cached information. These directories can contain many files and use up quite a bit of storage. An example is vscode, which stores quite a bit of data within the .vscode-server  directory located in $HOME . This directory can contain upwards of 1000 files and use on the order of 100 MB. This will impact your quota on home. We recommend moving such directories to a "fakeHome" directory in: /software/projects/<project>/<username>/fakeHome. Then generate a symbolic link in $HOME that points to the corresponding directory.:

Terminal X. Setting .vscode-server directory out of the HOME directory
$ mkdir -p $MYSOFTWARE/fakeHome
$ cp -r $HOME/.vscode-server $MYSOFTWARE/fakeHome   # if .vscode_server dir initially exists in $HOME
$ rm -r $HOME/.vscode-server                        # if .vscode_server dir initially exists in $HOME
$ mkdir -p $MYSOFTWARE/fakeHome/.vscode-server      # if .vscode_server did not initially existed in $HOME
$ ln -s $MYSOFTWARE/fakeHome/.vscode-server $HOME/.vscode-server       # generate a symbolic link

Note that we are using cp + rm and not mv to transfer the .vscode-server directory to another filesystem in order to get the right ownership of files in the new filesystem and remove their original ownership that is consuming the $HOME quota, otherwise the quota of the transferred files would still be assigned to $HOME quota.

Further explanation about quotas, permissions and copy (cp) vs move (mv) of files and directories is given in the sections below.

Software filesystem

The /software  filesystem is a Lustre file system with much higher throughput than /home. It is intended for software installations and Slurm batch script templates. Each project has an associated directory on the filesystem whose path is /software/projects/<project>. Within a project directory, each project member has his or her own directory whose full path, /software/projects/<project>/<username>, is contained in the MYSOFTWARE environment variable.

There are two types of quota in place on /software:

  • A project-wide quota of 256GB on the amount of used disk space, and
  • a per-user quota of 100,000 individual files. Notice that files belonging to different projects count towards the same user quota. In other words, a user can have a maximum of 100k files across all the projects she is involved in.


The software filesystem is intended for storage of software installations and Slurm batch scripts for the lifetime of the project.

All members of a project have read and write access to the /software/projects/<project> directory, so it can be used for sharing software installations and batch script templates within a project. Your allocation of space on /software exists for the duration of the project and is not subject to any automatic purging.

Quotas on disk space usage are managed per project group. If any member of the project exceeds the shared project quota on /software, it will affect the whole project and will be unable to save data (you may see a 'quota exceeded' message')

The project-wide quota consumption can be queried using the following command:

Terminal 2. Checking the project quota.
$ lfs quota -g $PAWSEY_PROJECT -h /software
Disk quotas for grp project1234 (gid xxxxx):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
      /software  1.688G    256G    256G       -   99415       0       0       -

where the first set of columns under used-quota-limit shows that only 1.6G of the 256G limit have been used.

Similarly, the per-user quota usage can be queried in the following way:

Terminal 3. Checking the per-user quota
$ lfs quota -u $USER -h /software
Disk quotas for usr user1234 (uid xxxx):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
      /software  14.16G      0k      0k       -   49053       0  100000       -

where the second set of columns under files-quota-limit shows that only 49053 i-nodes of the 100000 limit have been used.

Scratch filesystem

The scratch filesystem should be used for working data, which is input and output files actively used by jobs queued or running on the supercomputer.

Each project has a directory /scratch/<project> in which each project member has a subdirectory /scratch/<project>/<username>.

/scratch is a Lustre filesystem and its location is available in the environment variable $MYSCRATCH.

$ echo $MYSCRATCH


Use Acacia to store your data

The scratch file system is not intended for long-term storage, is not backed up and is purged on a regular basis. If you wish to retain files, move them to the Acacia object storage.

Warning:

Files which have not been accessed for the purge period of 21 days will be deleted automatically and WILL BE LOST. See Filesystem Policies.
(Note that this was changed since Monday 10th of June 2024, removing the previous rule of 1-month)

Never use the `touch` command to avoid purge

The use of the touch command to avoid purging generates overloading of the metadata servers of the /scratch filesystem. Remember that /scratch is shared among all users. Therefore, users should respect the best practices to avoid overloading of metadata servers at all times. Overloading of the metadata servers dramatically degrade performance affecting all users at the same time. Pawsey reserves the right to revoke access to users that do not respect this best practice. See Filesystem Policies.

As mentioned above, users should incorporate regular file movement from /scratch into Acacia for long-term storage.

The /scratch filesystem has the highest performance of the available filesystems, and allows jobs to temporarily use large amounts of storage while running. However, to maintain high performance for all users, there are limits of 2PB per project and 2 million files per user.

The project usage of the /scratch filesystem can be checked by using the following command, replacing [project]with your project name:

$ lfs quota -g $PAWSEY_PROJECT -h /scratch

To ensure that /scratch remains available to support jobs actively running on the system, it is critical to move files off the filesystem to a more permanent storage as workflows complete. The copy partition on Setonix can be used for these data transfer jobs.

Leaving files to be removed by the 21-day purge policy places an unnecessary load on the filesystem as the system is scanned for these files, and causes less capacity to be available for other users.


To minimise load on the filesystem, use the munlink command to delete files.

For more details, refer to Deleting large numbers of files.

Reference datasets

Reference data sets are static data required by software for calibrations or testing or as widely used input data. Reference data sets that are used by several project groups will be provided on /scratch by Pawsey to avoid multiple copies existing. These data sets will be contained in subdirectories of /scratch/references.

Examples include:

  • /scratch/references/askap 
  • /scratch/references/mwa 
  • /scratch/references/blastdb_update  

These reference datasets will be exempt from the /scratch  purge policy.

Some of the bioinformatics reference datasets available are:

  • 10x single cell gene expression
  • 10x spatial gene expression
  • Alphafold
  • Arabidopsis thaliana
  • Blast+ database (regularly updated)
  • Diamond
  • Human Broad bundle hg19, Broad bundle hg38, and GRCh38
  • Interproscan-5.56-89.0
  • Metagenome_atlas_2.9
  • Mouse Broad bundle mm10, NCBI MM10, UCSC GRCm38, RNA M25
  • Qiime
  • Sarek 
  • VEP

For more information, see the Life Science and Bioinformatics Landing Page page.

If you would like to request addition of a new reference dataset, please email the Pawsey Helpdesk help@pawsey.org.au 

File permissions and quota

The effect of file permissions and ownership on storage quotas varies depending on which filesystem the data is located. The default behaviour can be summarised as such:

  • Files created in a user's /home are accessible only to that user.
  • Files created in a user's /software and /scratch directories are accessible only to that user and to members of the same project.

 For more detail on these filesystems refer to Filesystem Policies. The filesystem quotas are summarised in table 1:


Table 1. Pawsey filesystems: capacity, file limit and duration

FilesystemCapacity LimitFile LimitDuration
/home 1 GB per user10k files per userActive project allocation
/software 256 GB per project100k files per userActive project allocation
/scratch 1 PB per project1M files per user21 days from last modification (see Filesystem Policies)

The default group membership for files and directories that are created in /home is the user's primary group, which is the same as their user ID. Files and directories that are created in any of the Lustre filesystems are associated by default with the user's project ID.

For the /software filesystem, Pawsey uses a file's group ownership to calculate its effect on storage quotas. To make use of the group quota for a project, files must be associated with the group corresponding to that project ID.

A user is always a member of their own primary group (which is the same as their own username) and can also be a member of more than one project. This is important to know because files created with a group associated to a username rather than a project are limited to a default quota of 1GB and there can be at most 100 of them.

If you encounter a write error, compiler error, or file transfer error on the /scratch or /software filesystems, this is likely because the files are counting against your personal group quota rather than your project's group quota.

You should proactively and regularly monitor both file count and quota usage across the filesystems. This practice will reduce your likelihood of hitting the quota limits; whenever this happens, no files can be written until usage is brought back below quota.

As regards the /home filesystem, regularly check for and clean unneeded files, which may be generated by software as temporary or cache files. These are often stored in hidden directories (their name starts with a dot).

File permissions and ownerships are also important to consider. The default permissions of files created by a user on any of the Setonix filesystems is the same, but the default ownerships are different. An example of default properties of a file in /home filesystem is as below:

Terminal 4. List the default file permissions and ownerships for a file created on Setonix /home
$ ls -ld myscript.sh
-rw-r--r-- 1 username username 2 Nov 30 16:33 myscript.sh

Note that there are 10 characters describing the permissions. The first character is not really a permission, but an indication of the type "file". So a "-" in the first character indicates that myscript.sh is indeed a file. (A "d" would indicate it is not a file but a directory, an "l" would indicate it is a link, etc.). The rest nine characters indicate the permissions of the file, and these permissions are broken down into three groups of three:

-The - in the first character indicates this is a file.
 rwx

The first set of permissions determines what actions can be performed by the owner of the file. In this case username is the owner, and is allowed to read (r), write (w), and execute the file (x).

    r--

The second set of permissions determines what actions can be performed by other users who belong to the same group as the file. The group here is the primary or default group of the file's owner, which is username. Group members are allowed to read only.

       r--

The final set of permissions apply to all other users. While the permissions are set to read, the top-level user directory is locked to just the user or project, so others are not able to read, write, or execute files in another group's directories.

An example of default properties of a file in /software filesystem is as below:

Terminal X. List the default file permissions for a file created on Setonix /software & /scratch
$ ls -ld otherscript.sh
-rw-r--r-- 1 username projectgroup 2 Nov 30 16:33 otherscript.sh

So, as said, default permissions are the same, but not the default ownerships (the next two words after the permissions).

The next two words after the permissions are, respectively, the owner-name and the group-name of the file. In the case of files created in /home, both the owner and the group are assigned to the username by default. And in the case of files created in /software and /scratch, the default for the group-name is the projectgroup for your project.


It is important to note because the quota limits explained in the section above are really counted on the group-name of the files and not the owner-name. Then, the quota limit of all the files that have groupname = username (added among all our filesytems) is 10k files, which is very limited. This limitation has frequently hit the quota limit for users that do not follow our recommendations for installing software, because some installation tools/commands may override the default group-name of /scratch and software and assign it as username instead of projectgroup. (Check the related pages at the bottom of this page.)

For the same reason, when transferring files between filesystems, we do not recommend the use of the mv (move) command, because this command preserves the original group-name. Instead, we recommend the use of the cp command (or other tools that allow the assignment of the default ownerships for the destination filesystem). When using cp, do not use the -a or -p flags. If you want to preserve timestamps, use cp --preserve=timestamps. For example, considering the files in $HOME as above:

Terminal X. Do not use mv when transferring files to another filesystem
$ ls -ld $HOME/myscript.sh
-rw-r--r-- 1 username username 2 Nov 30 16:33 myscript.sh

$ cp $HOME/myscript.sh $MYSOFTWARE/myscript_cp.sh        #good practice
$ mv $HOME/myscript.sh $MYSOFTWARE/myscript_mv.sh        #bad  practice

$ ls -ld $MYSOFTWARE/myscript*
-rw-r--r-- 1 username projectgroup 6 Dec 10 10:03 /software/projects/project/username/myscript_cp.sh
-rw-r--r-- 1 username username     2 Nov 30 16:33 /software/projects/project/username/myscript_mv.sh


File transfer programs like WinSCP can also cause issues with permissions and groups. You should consult the documentation of your preferred transfer program. rsync users should avoid using the -a and -p flags; these flags will preserve permissions of the source files, which may conflict with the default behaviour on Pawsey systems. Some additional information about file transfer programs is at: Transferring Files in/out Pawsey Filesystems.

Pawsey provides a tool that lets you fix file and directory ownerships on /software. The fix.group.permission.sh script is available in the pawseytools module, which is loaded by default. To use it, enter the script name followed by your group name. For example, if your project ID is projectgroup you would enter this:

$ fix.group.permission.sh projectgroup

Notes:

  • This script might take some time to complete.
  • It will only fix files and directories owned by the user executing the command ($USER).
  • You can only run one instance of the script at a time.


There is a manual way of doing this in your own area using the find command. Replace projectgroup with your project ID and username with your user name.

Terminal 5. Fix file and directory permissions on /software
$ find /software/projects/projectgroup/username ! -group projectgroup -exec chgrp projectgroup \{} \;
$ find /software/projects/projectgroup/username -type d ! -perm /g=s -exec chmod g+s \{} \;
Terminal 6. Fix file and directory permissions on /scratch
$ find /scratch/projectgroup/username ! -group projectgroup -exec chgrp projectgroup \{} \;
$ find /scratch/projectgroup/username -type d ! -perm /g=s -exec chmod g+s \{} \;


Related pages

External links