/
Data Workflows

Data Workflows

A data workflow describes how data needed or produced by your project is moved in and out the Pawsey supercomputing systems, and where it lives for the duration of the project.

When planning a supercomputing project it is important to design the whole data workflow to conform to the policies for filesystems at the Pawsey Supercomputing Centre.

Before you begin, review the recommendations and policies in the following pages:

The /scratch filesystem should be used user programs to write and read files, especially the large ones, during a job. This filesystem is tuned to deliver high bandwidth for data access. Datasets should not be stored on /scratch after a job completes. Unnecessary data should be removed from /scratch and important data should be copied to a more appropriate place, for example the Acacia object storage, local institutional storage or long term storage. This process can be automated with the use of the job dependencies feature of the Slurm scheduler.

An example involving the staging of data from Acacia, executing a supercomputing job and storing results back into Acacia is available in the /wiki/spaces/DATA/pages/54459952.

Related content

Pawsey Filesystems and their Use
Pawsey Filesystems and their Use
Read with this
File Management
More like this
Data Storage and Management Policy
Data Storage and Management Policy
Read with this
Transferring Files
Transferring Files
More like this
Changes to Allocation Schemes for 2024
Changes to Allocation Schemes for 2024
Read with this
Supercomputing workflow example
Supercomputing workflow example
More like this