Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Column


Note
titleWork in Progress for Phase-2 Documentation

The content of this section is currently being updated to provide material relevant for Phase-2 of Setonix and the use of GPUs, which is expected to soon be available to Pawsey projects with Setonix GPU allocations.



Excerpt

This page is intended to help users of previous Pawsey GPU supercomputing infrastructure (such as Magnus, Galaxy and ZeusTopaz) to transition to using the Setonix supercomputer.

...

Throughout this guide links are provided to relevant documentation pages in the general Supercomputing Documentation and to the Setonix User Guide, which provides documentation specifically for using GPUs the Setonix system.

The guide has been updated now that Magnus and Zeus have been decommissionedin preparation for the migration of GPU projects on Topaz to Setonix Phase 2.

Starting with Setonix

Setonix is the new petascale supercomputer at the Pawsey Supercomputing Centre, ranking 15th in the world for performance in the Top500 list and ranking 4th Green500 list for energy efficiency.

It is arriving arrived in two phases:

  • Setonix Phase 1: An initial 2.7 petaflop supercomputer based on HPE Cray EX supercomputer initially with 504 AMD CPU nodes, available for merit projects in 2022.
  • Setonix Phase 2: The supercomputer is expanded to 50 petaflops with AMD CPU and An expanded system with 1600 CPU nodes and 192 GPU nodes, available to merit projects in 2023.

Setonix replaces Phase 2 GPUs replace Pawsey's previous generation of GPU infrastructure, including the Magnus, Galaxy and Zeus supercomputers specifically the Topaz GPU cluster and associated filesystems. This migration guide outlines has been updated to outline changes for researchers transitioning from these legacy systems Topaz to Setonix Phase 1.

Significant changes to the GPU compute architecture include:

  • Moving from 24 core Intel nodes to 128 core AMD nodes ​(significantly more cores per node and in total)
  • Changing from 2.5 GB per core to 2 GB per core (slightly less memory per core)
  • Changing from 64 GB  to 256 GB (significantly more memory per node)16 or 28 cores per node on Intel host CPUs to 64 cores per node on the AMD host CPU.
  • Changing from 192 GB to 256 GB of RAM per node in some cases.
  • Transition from using NVIDIA V100 and P100 GPUs to AMD MI250x GPUs.

For more details refer to the System overview section of the Setonix User Guide.​

The Setonix operating system and environment will be a newer version of the Cray Linux Environment familiar to users of Magnus and Galaxy. It will also include scheduling features previously provided separately on Zeus and Topaz. This will enable the creation of end-to-end workflows running on Setonix, as detailed in the following sections.

...

There are several new filesystems that will be available with the Setonix supercomputer.

  • The previous 3 petabyte /scratch filesystem is replaced by a new 14 petabyte /scratch filesystem.
  • The previous /home filesystem is replaced by a new /home filesystem.
  • For software and job scripts, the previous /pawsey and /group filesystems are replaced by a new /software filesystem.
  • For project data, the previous /group filesystem is replaced by the Acacia object store.

...

The module environment is provided by Lmod, which was used previously on Zeus and Topaz, rather than Environment Modules used on Magnus and Galaxy. The usage commands are extremely similar, with some minor differences in syntax and output formats.

Setonix has a newer version of the Cray Linux Environment that was present on Magnus and Galaxy, which used programming environment modules to select the compilation environment.

...

  • Lmod is used to provide modules in place of environment modules.
  • Module versions should be specified when working with modules.
  • The PrgEnv-gnu programing environment is now the default.

Refer to the Software Stack pages for more detail on using modules and containers.

...

The Setonix supercomputer has a different hardware architecture to previous supercomputing systems, and the compilers and libraries available may have changed or have newer versions. It is strongly recommended that project groups reinstall any necessary domain-specific software. This is also an opportunity for project groups to review the software in use and consider updating to recent versions, which typically contain newer features and improved performance.

Due to the change from NVIDIA GPUs on Topaz to AMD GPUs on Setonix, AMD's ROCm and HIP technologies should be used instead of CUDA, and are provided via the rocm module.

Key changes to software installation and maintenance on Setonix include:

  • The new processor architecture has seen the Intel programming environment (PrgEnv-intel) replaced by an AMD programming environment (PrgEnv-aocc).
  • The GNU programming environment has newer versions of the gcc, g++ and gfortran compilers, and is the default environment on Setonix.
  • The Cray programming environment has newer versions of the Cray C/C++ and Fortran compilers.
  • The newer Cray C/C++ is now based on the Clang back end, and the command line options have changed accordingly.
  • Pawsey has adopted Spack for assisted software installation, which may also be useful for project groups to install their own software.
  • Pawsey has adopted SHPC to deploy some applications (particularly bioinformatics packages) as container modules, which may also be useful for some project groups.
  • ROCm and HIP should be used in place of CUDA for GPU-acceleration.

Refer to How to Install Software and SHPC (Singularity Registry HPC) in the Supercomputing Documentation for more detail.

For information specific to Setonix refer to the Compiling section of the Setonix User Guide. 

...

Setonix uses Slurm, which is the same job scheduling system used on the previous generation of supercomputing systems. Previously, several specific types of computational use cases for were supported on Zeus rather than the main petascale supercomputer, Magnus. Such use cases were often used for pre-processing and post-processing. These specialised use cases are now supported on Setonix alongside large scale computational workloads.

Note that separate GPU allocations are used for GPU jobs, which are similar to the usual project name with a -gpu suffix. These GPU allocations are only used for submitting and managing GPU allocations in Slurm. Software installations and working data for GPU jobs still use the directories and file systems associated with the base project name.

Key changes

  • Jobs may share nodes, allowing jobs to request a portion of the cores and memory available on the node. 
  • Jobs can still specify exclusive node access where necessary.​
  • A partition for longer running jobs is available.
  • Nodes with additional memory are available.
  • A partition for data transfer jobs is available.
  • Job dependencies can be used to combine data transfer and computational jobs to create automated workflows.
  • Partitions for GPU jobs are available.
  • Separate GPU allocations are used to schedule GPU jobs.

For more information refer to Job Scheduling in the Supercomputing documentation.

...

On previous supercomputing systems, such as Magnus, Galaxy, and ZeusTopaz, the /group filesystem was used to provide this functionality.

...

  1. Log in to Setonix for the first time.
  2. Transfer data on to the new filesystems:
    • Working data should be placed in the new /scratch filesystem.
    • Project data should be placed in the Acacia object store.
  3. Familiarise yourself with the available modules and versions available on Setonix.
  4. Install any additional required domain-specific software for yourself or your project group using the /software filesystem.
  5. Prepare job scripts for each step of computational workflows by keeping templates in the /software filesystem, including:
    1. Staging of data from the Acacia object storage or external repositories using the data mover nodes
    2. Pre-processing or initial computational jobs using appropriate partitions
    3. Computationally significant jobs using appropriate partitions
    4. Post-processing or visualisation jobs using appropriate partitions
    5. Transfer of data products from /scratch to the Acacia object store or other external data repositories
  6. Submit workflows to the scheduler, either manually using scripts or through workflow managers.

Migration Training

There is also a series of six migration training sessions to provide assistance in migrating:

  • Module 1 - Getting Started with Setonix
  • Module 2 - Supercomputing Filesystems
  • Module 3 - Using Modules and Containers
  • Module 4 - Installing and Maintaining Software
  • Module 5 - Submitting and Monitoring Jobs
  • Module 6 - Using Data throughout the Project Lifecycle

Registration details for these modules will be available on the Pawsey events page.

...

  1. .

Related pages

...