This page is intended to help users of previous Pawsey GPU supercomputing infrastructure (such as Topaz) to transition to using the Setonix supercomputer. |
This migration guide focuses on the changes to and additional features of the new system, and assumes the reader is familiar with working with supercomputers. The Supercomputing Documentation provides background context to important supercomputing concepts, which may be helpful if you are getting started with using supercomputers for the first time.
Throughout this guide links are provided to relevant documentation pages in the general Supercomputing Documentation and to the Setonix User Guide, which provides documentation specifically for using GPUs the Setonix system. The Setonix GPU Partition Quick Start also provides specific details for using the GPUs in Setonix.
This guide has been updated in preparation for the migration of GPU projects on Topaz to Setonix Phase 2.
Setonix is the new petascale supercomputer at the Pawsey Supercomputing Centre, ranking 15th in the world for performance in the Top500 list and ranking 4th Green500 list for energy efficiency.
It arrived in two phases:
Setonix Phase 2: An expanded system with 1600 CPU nodes and 192 GPU nodes, available to merit projects in 2023.
Setonix Phase 2 GPUs replace Pawsey's previous generation of GPU infrastructure, specifically the Topaz GPU cluster and associated filesystems. This migration guide has been updated to outline changes for researchers transitioning from Topaz to Setonix Phase 2.
Significant changes to the GPU compute architecture include:
For more details refer to the System overview section of the Setonix User Guide.
The Setonix operating system and environment will be a newer version of the Cray Linux Environment familiar to users of Magnus and Galaxy. It will also include scheduling features previously provided separately on Zeus and Topaz. This will enable the creation of end-to-end workflows running on Setonix, as detailed in the following sections.
There are several new filesystems that will be available with the Setonix supercomputer.
/scratch
filesystem is replaced by a new 14 petabyte /scratch
filesystem./home
filesystem is replaced by a new /home
filesystem./pawsey
and /group
filesystems are replaced by a new /software
filesystem./group
filesystem is replaced by the Acacia object store.
These new filesystems will have the following limits:
|
For more information on Pawsey filesystems refer to the File Management page.
For information specific to Setonix refer to the Filesystems and data management section of the Setonix User Guide.
The software environment on Setonix is provided by a module environment very similar to that of the previous supercomputing systems.
The module environment is provided by Lmod, which was used previously on Topaz.
Setonix has a newer version of the Cray Linux Environment that was present on Magnus and Galaxy, which used programming environment modules to select the compilation environment.
For containers, researchers can continue to use Singularity in a similar way to previous systems. Some system-wide installations (in particular, for bioinformatics) are now performed as container modules using SHPC: these softwares are installed as containers, but the user interface is the same as for compiled applications (load module, run executables).
There is a library of GPU-enabled containers that support the AMD MI250X GPUs available from the AMD Infinity Hub. Note that these containers may be limited in parallelism to one node, or one GPU, depending on the particular software.
Key changes to the software environment include:
Refer to the Software Stack pages for more detail on using modules and containers.
For information specific to Setonix refer to the Software Environment section of the Setonix User Guide.
The Setonix supercomputer has a different hardware architecture to previous supercomputing systems, and the compilers and libraries available may have changed or have newer versions. It is strongly recommended that project groups reinstall any necessary domain-specific software. This is also an opportunity for project groups to review the software in use and consider updating to recent versions, which typically contain newer features and improved performance.
Due to the change from NVIDIA GPUs on Topaz to AMD GPUs on Setonix, AMD's ROCm and HIP technologies should be used instead of CUDA, and are provided via the rocm
module.
Key changes to software installation and maintenance on Setonix include:
Refer to How to Install Software and SHPC (Singularity Registry HPC) in the Supercomputing Documentation for more detail.
For information specific to Setonix refer to the Compiling section of the Setonix User Guide.
Setonix uses Slurm, which is the same job scheduling system used on the previous generation of supercomputing systems. Previously, several specific types of computational use cases for were supported on Zeus rather than the main petascale supercomputer, Magnus. Such use cases were often used for pre-processing and post-processing. These specialised use cases are now supported on Setonix alongside large scale computational workloads.
Note that separate GPU allocations are used for GPU jobs, which are similar to the usual project name with a -gpu
suffix. These GPU allocations are only used for submitting and managing GPU allocations in Slurm. Software installations and working data for GPU jobs still use the directories and file systems associated with the base project name.
Key changes
For more information refer to Job Scheduling in the Supercomputing documentation.
For information specific to Setonix refer to the Running Jobs section, and particularly for GPU jobs the Example Slurm Batch Scripts for Setonix on GPU Compute Nodes page of the Setonix User guide.
When using Pawey's supercomputing infrastructure, there may be project data that is needed to be available for longer than the 30 day /scratch
purge policy. For example, a reference set of data that is reused across many computational workflows.
On previous supercomputing systems, such as Topaz, the /group
filesystem was used to provide this functionality.
For Setonix, this functionality is provided by the Acacia object storage system.
Key changes include:
/scratch
if needed at the start of computational workflows./scratch
to Acacia if needed following computational jobs.For more information on using Acacia, refer to the /wiki/spaces/DATA/pages/54459526 in the Data documentation.
For more information on job dependencies, refer to Example Workflows in the Supercomputing documentation.
Consider the following steps when planning the migration of your computational workflow:
/scratch
filesystem./software
filesystem./software
filesystem, including:/scratch
to the Acacia object store or other external data repositories