Data Workflows
A data workflow describes how data needed or produced by your project is moved in and out the Pawsey supercomputing systems, and where it lives for the duration of the project.
When planning a supercomputing project it is important to design the whole data workflow to conform to the policies for filesystems at the Pawsey Supercomputing Centre.
Before you begin, review the recommendations and policies in the following pages:
- Resource Overview (mainly the "Filesystems" and "Object Storage" sections)
- Pawsey Filesystems and their Use
- Filesystem Policies
The /scratch
filesystem should be used user programs to write and read files, especially the large ones, during a job. This filesystem is tuned to deliver high bandwidth for data access. Datasets should not be stored on /scratch
after a job completes. Unnecessary data should be removed from /scratch
and important data should be copied to a more appropriate place, for example the Acacia object storage, local institutional storage or long term storage. This process can be automated with the use of the job dependencies feature of the Slurm scheduler.
An example involving the staging of data from Acacia, executing a supercomputing job and storing results back into Acacia is available in the /wiki/spaces/DATA/pages/54459952.