Skip to end of banner
Go to start of banner

Setonix General Information

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 20 Next »

Work in Progress for Phase-2 Documentation

The content of this section is currently being updated to include material relevant for Phase-2 of Setonix and the use of GPUs.
On the other hand, all the existing material related to Phase-1 and the use of CPU compute nodes can be considered safe, valid and up-to-date.

Setonix is Pawsey's flagship supercomputer based on the HPE Cray EX architecture that was commissioned in 2020 and delivered in two phases over the course of 2022 and 2023.

Setonix is the scientific name for the Quokka, a very popular animal found on Rottnest Island, Western Australia.

On this Page

System Overview

The Setonix supercomputer is a heterogeneous system consisting of AMD CPUs and GPUs, based on the HPE Cray EX architecture. It has more than 200,000 CPU cores and 750 GPUs,  interconnected using the Slingshot-10 interconnect with 200Gb/s bandwidth per connection. The AMD Infinity Fabric interconnect provides a direct channel of communication among GPUs as well as between CPUs and GPUs.

The system will be delivered to the Pawsey Supercomputing Centre by HPE in two phases, conveniently named Phase 1 and Phase 2. Phase 1 included all of the filesystems, one-third of the CPU-only compute nodes, as well as visualisation and high-memory nodes. The CPU-only nodes are equipped with two AMD Milan CPUs for a total of 128 cores and 256Gb of RAM, along with 8 high memory nodes with 1 TB of RAM. Phase 2 brings the total CPU nodes to 1600 including the high memory nodes, as well as 192 GPU-enabled nodes with one 64-core AMD Trento CPU and 4 AMD MI250X GPU cards providing 8 logical GPUs per node.

Table 1. Setonix Node Overview

Type

N. Nodes

CPU

Cores Per Node

RAM Per Node

Login

4

AMD Milan2x 64256GB

CPU computing

504 (Phase 1)

1592 (Total)

AMD Milan (2.45GHz, 280W)

2x 64

256GB
CPU high memory8AMD Milan (2.45GHz, 280W)2x 641TB
GPU computing154 (Phase 2)AMD Trento1 x 64256GB
GPU high memory38 (Phase 2)AMD Trento1 x 64512GB
Data movement8

AMD 7502P

1x 32128Gb

More details regarding the hardware architecture and filesystems are made available in the sections below.

Hardware architecture

Login and management nodes are placed within air-cooled cabinets. Compute nodes are hosted in liquid-cooled cabinets instead. Each compute cabinet is made of eight chassis, containing eight custom compute blades each. Each compute cabinet also hosts up to 64 Slingshot switches, each having in turn 64 200Gbps ports. Compute blades and network switches are connected orthogonally. All Setonix nodes are connected using the dragonfly topology.


Figure 1. Representation of a chassis in a compute cabinets, showing how switches, compute blades, node cards, and nodes relate to each other.

Each compute blade has two independent node cards, each of which hosts two compute nodes

AMD Zen3 CPU architecture

A CPU compute node has 2 AMD Milan EPYC CPUs with 64 cores each and 256Gb of RAM. The 64 cores of a Zen3 AMD CPU (shown in Figure 2 below) are evenly distributed across eight Core Chiplet Dies (CCD), each of which has 32Mb of L3 cache shared among all the cores on that CCD (shown in Figure 3 below). There is no limitation on the use of the L3 cache by a single Zen3 core, that can use up all of it. The Zen3 CPU is composed of 8 such CCDs, all connected to an additional memory and I/O controller die through the AMD Infinity Fabric. There are 8 memory channels, each with up to RAM circuits (DIMMS). The CPU supports 128 lanes of PCIe gen4 and up to 32 SATA or NVMe direct connect devices. Every two CCDs form a NUMA region. For more information about NUMA regions check the output of the lstopo-no-gui program.




Figure 2. Schematic representation of the Zen3 CPU.



Figure 3. Cores on a Zen3-based AMD CPU are partitioned in groups of eight, all residing on a Core Chiplet Die (CCD) and sharing the same L3 cache.


GPU node architecture

Each GPU compute node has one AMD Trento EPYC CPU with 64 cores and 256Gb of RAM. The Trento CPU architecture is similar to the Milan CPUs in the CPU nodes, with additional support for Infinity Fabric links to the four AMD MI250X GPU cards per node. The node architecture of the Setonix GPU nodes is pictured in Figure 4 below. Each L3 cache region is connected to a logical GPU in the MI250X GPU cards via Infinity Fabric connections. These GPUs are also closely inter-connected via numerous Infinity Fabric links, and also connect to the Slingshot NIC cards for data transfer between nodes.  

Figure 4. GPU node architecture

Note that each MI250X has two Global Compute Die (GCD) that are accessible as two logical GPUs, for a total of eight per node. The GCD architecture is shown in Figure 5 below, and consists of 110 compute units (for 220 per MI250X, or 880 per node) with 64GB of GPU memory (for 128GB per MI250X, or 512GB per node).

Figure 5. MI250X GCD architecture


Filesystems and data management

  • The /home filesystem, where the user can save personal configurations files;
  • The /software filesystem, hosting the Pawsey-provided software stack, and where users can install software;
  • The /scratch filesystem, high-performance, parallel filesystem to be used for I/O operations within jobs.

Lustre filesystems are connected to compute nodes through the Slingshot fabric.

Because /scratch is a temporary storage solution, Pawsey provides users with the Acacia storage system to store data for the lifetime of their projects. It is based on the object storage paradigm, as opposed to a file storage system, and users transfer data to and from Acacia using a dedicated command-line tool. Check Pawsey Object Storage: Acacia for more information.

Available filesystems on Setonix are summarised in Table 2.

Table 2. Important filesystems mounted on Setonix

Mount point

Variable

Type

Size

Description

/scatch

MYSCRATCH

Lustre filesystem

14.4PB

A high-performance parallel filesystem for data processing.

/software

MYSOFTWARE

Lustre filesystem

393TB

Where system and user software are installed.

/home

HOME

NFS

92TB

Storage relatively small numbers of important system files such as your Linux profile and shell configuration.

/astro



2.8PB

Filesystem dedicated to astronomy research.

More information about filesystems and data management can be found in File Management.

Accounting

The cost of running a job on Setonix is expressed in Service Units (SUs) and it is given by the following formula.

    Partition Charge Rate ✕ Max(Cores Proportion, Memory Proportion) ✕ N. of nodes requested ✕ Job Elapsed Time (Hours).

Where,

  • Partition Charge Rate is a constant value associated with each Slurm partition,
  • Core proportion is the number of CPU cores per node requested divided by the total number of CPU cores per node,
  • Memory proportion is the amount of memory per node requested divided by the total amount of memory available per node,

For Setonix CPU nodes, the charge rate is 128 SU per node hour, as each CPU node has 128 cores.

For Setonix GPU nodes, the charge rate is 512 SU per node hour, based on the difference in energy consumption between the CPU and GPU node architectures.

Maintenance

Due to the cutting-edge nature of Setonix, regular and frequent updates of the software stack is expected during the first year of Setonix's operation as further optimisations and improvements are made available.



  • No labels