Changes in Supercomputing Services for 2023
This page outlines main changes in Pawsey's next-generation supercomputing services.
Setonix will experience downtime in the beginning of January 2023 while final acceptance testing is carried out
Background
In 2018 the Australian Government awarded $70 million to upgrade Pawsey’s supercomputing infrastructure. As part of the Pawsey Capital Refresh project, Pawsey deployed and made available Setonix Phase1 CPU system in June 2022. Full scale Setonix Phase 2 system, as rendered on Figure 1 below, will be made available to researchers for 2023 allocation round. This page provides a summary of main changes to Pawsey supercomputing services in 2023, including changes in allocation schemes.
Figure 1. Render of HPE Cray EX system Setonix
The Supercomputer
Setonix will accommodate all new allocations starting from 2023 allocation round. Table below presents the overview of Setonix Phase 1 and Phase 2 resources.
Table 1. Setonix Phase 1 and Phase 2 resources explained.
Purpose | Nodes | CPU | Cores Per Node | GPU | RAM Per Node | Availability |
---|---|---|---|---|---|---|
Setonix Phase 1 | ||||||
Log In | 4 | 2x AMD EPYC 7713 "Milan" | 2x 64 | 256GB | Available from June 2022 | |
CPU computing | 504 | 2x AMD EPYC 7763 "Milan" | 2x 64 | 256GB | Available from June 2022 | |
CPU High mem | 8 | 2x AMD EPYC 7763 "Milan" | 2x 64 | 1TB | Available from June 2022 | |
Data movement | 8 | 2x AMD 7502P | 1x 32 | 128GB | Available from June 2022 | |
Setonix Phase 2 will add the following | ||||||
Log In | 6 (total: 10) | 2x AMD EPYC 7713 "Milan" | 2x 64 | 256GB | Available from 2023 allocation round | |
CPU computing | 1088 (total: 1592) | 2x AMD EPYC 7763 "Milan" | 2x 64 | 256GB | Available from 2023 allocation round | |
GPU computing | 154 | 1x AMD optimised 3rd Gen EPYC "Trento" | 1x 64 | 8 GCDs (from 4x "AMD MI250X" cards, each card with 2 GCDs), 128 GB HBM2e | 256GB | Available from 2023 allocation round |
GPU High mem | 38 | 1x AMD optimised 3rd Gen EPYC "Trento" | 1x 64 | 8 GCDs (from 4x "AMD MI250X" cards, each card with 2 GCDs), 128 GB HBM2e | 512GB | Available from 2023 allocation round |
Data movement | 8 (total: 16) | 2x AMD 7502P | 1x 32 | 128GB | Available from 2023 allocation round |
The Software
Pawsey has developed new Software Stack Policies which describe the principles behind the configuration, maintenance and support of the scientific software stack on Pawsey systems.
The List of Supported Software provides an overview of software that is centrally installed and supported at Pawsey.
The Accounting Model
With Setonix, Pawsey is moving from an exclusive node usage to a proportional node usage accounting model. While the Service Unit (SU) is still mapped to the hourly usage of CPU cores, users are not charged for whole nodes irrespective of whether they are been fully utilised. With the proportional node usage accounting model, users are charged only for the portion of a node they requested.
Each CPU compute node of Setonix can run multiple jobs in parallel, submitted by a single user or many users, from any project. Sometimes this configuration is called shared access.
A project that has entirely consumed its service units (SUs) for a given quarter of the year will run its jobs in low priority mode, called extra, for that time period. Furthermore, if its service unit consumption for that same quarter hits the 150% usage mark, users of that project will not be able to run any more jobs for that quarter.
Pawsey accounting model bases the GPU charging rate on energy consumption. Such approach, designed for Setonix, has a lot of advantages compared to other models, introducing carbon footprint as a primary driver in determining the allocation of computational workflow on heterogeneous resources.
Pawsey and NCI centres are using slightly different accounting models. Researchers applying for allocations on Setonix and Gadi should refer to Table 2 when calculating their allocation requests.
Table 2. Setonix and Gadi service unit models
Resources used | Service Units | |
---|---|---|
Gadi CPU: 48 Intel Cascade Lake cores per node GPU: 4 Nvidia V100 GPUs per node | Setonix CPU: 128 AMD Milan cores per node GPU: 4 AMD MI250X GPUs per node | |
1 CPU core / hour | 2 | 1 |
1 CPU / hour | 48 | 64 |
1 CPU node / hour | 96 | 128 |
1 GPU / hour | 36* | 128 |
1 GPU node / hour | 144* | 512 |
* calculated based on https://opus.nci.org.au/display/Help/2.2+Job+Cost+Examples for gpuvolta queue
How to estimate Service Units request for Setonix-GPU?
Researchers planning their migration from NVIDIA-based GPU systems like NCI’s Gadi to AMD-based Setonix-GPU should use the following example strategy to calculate their Service Units request.
- Simulation walltime on a single NVIDIA V100 GPU: 1h
- Safe estimate for Service Units usage on a single Setonix’s AMD MI250X GPU: 1h * 1/2 * 128 = 64 Service Units
Please see: https://www.amd.com/en/graphics/server-accelerators-benchmarks
Setonix-GPU migration pathway
The Setonix’s AMD MI250X GPUs have a very specific migration pathway related to CUDA to HIP and OpenACC to OpenMP conversions. Pawsey is working closely with research groups within PaCER project (https://pawsey.org.au/pacer/) and with vendors to further extend the list of supported codes.
Please see: https://www.amd.com/en/technologies/infinity-hub
The Allocation Schemes
Compute-time merit allocations on Setonix may be obtained through the following schemes:
- The National Computational Merit Allocation Scheme (NCMAS) – This scheme operates annual allocation calls open to the whole Australian research community and provides substantial amounts of compute time for meritorious, computational-research projects.
- The Pawsey Partner Merit Allocation Scheme – This scheme operates annual calls open to researchers in Pawsey Partner institutions and provides significant amounts of compute time for meritorious, computational research projects. The Partner institutions are CSIRO, Curtin University, Edith Cowan University, Murdoch University and The University of Western Australia. There is an out-of-session application process for newly eligible project leaders.
- The new Preparatory Access Scheme is now available for researchers preparing their applications to merit allocations schemes. It is designed to support feasibility studies and benchmarking. More information about this scheme: Preparatory Access Scheme.
Single application to National Computational Merit Allocation Scheme (NCMAS) and Pawsey Partner Merit Allocation Scheme schemes can now include Setonix-CPU and Setonix-GPU requests. Researchers can apply only for Setonix-CPU allocation, only for Setonix-GPU allocation or for both.
The minimum allocation request for National Computational Merit Allocation Scheme (NCMAS) and Pawsey Partner Merit Allocation Scheme is 1M Service Units.
Pawsey Partner allocation top-ups will not be offered starting from 2023 allocation round. Researchers can submit their applications to both schemes separately. New application form for Pawsey Partner scheme allows the reuse of documents submitted to NCMAS.
Pawsey has improved its technical review process. Scalability criteria is now covering: CPU scalability, GPU scalability as well as data-centric workflows scalability.
Table 3. Resources available on Setonix for the 2023 allocation round
Scheme | Request full year | |
---|---|---|
National Computational Merit Allocation Scheme | Scheme total capacity | 455M Service Units Total:
|
Minimum request size | 1M Service Units | |
Pawsey Partner Merit Allocation Scheme | Scheme total capacity | 540M Service Units
|
Minimum request size | 1M Service Units |
The Storage
There are number of changes to File Management on Setonix.
- Pawsey Filesystems and their Usage provides a detailed description of filesystems available on Setonix,
- Pawsey Object Storage: Acacia provides a detailed description of the research data storage service Acacia.
All Supercomputing, Nimbus and Visualisation projects are granted with 1TB allocation on Acacia, which is shared amongst all project members. If you require more than 1TB, but less than 10TB, please email an appropriate request to the Pawsey helpdesk. If you require more than 10TB, then you will need to submit an application for Managed Storage in Data Services.
External links
- Supercomputing Documentation
- Acacia - User Guide
- Cristian Di Pietrantonio, Christopher Harris, Maciej Cytowski, "Energy-based Accounting Model for Heterogeneous Supercomputers"