Setonix General Information

Setonix General Information

Setonix is Pawsey's flagship supercomputer based on the HPE Cray EX architecture that was commissioned in 2020 and delivered in two phases over the course of 2022 and 2023.

Setonix is the scientific name for the Quokka, a very popular animal found on Rottnest Island, Western Australia.

On this Page

System Overview

The Setonix supercomputer is a heterogeneous system consisting of AMD CPUs and GPUs, based on the HPE Cray EX architecture. It has more than 200,000 CPU cores and 750 GPUs,  interconnected using the Slingshot-11 interconnect with 200Gb/s bandwidth per connection. The AMD Infinity Fabric interconnect provides a direct channel of communication among GPUs as well as between CPUs and GPUs.

The CPU-only nodes are equipped with two AMD Milan CPUs for a total of 128 cores and 256Gb of RAM, along with 8 high memory nodes with 1 TB of RAM. The total CPU nodes is 1600 including the high memory nodes. There are 192 AMD GPU-enabled nodes with one 64-core AMD Trento CPU and 4 AMD MI250X GPU cards providing 8 logical GPUs per node. There is also a 4 NVIDIA GraceHopper GH200 superchip nodes, each node containing 4 GH200 superchips, available for access through the Setonix-Q allocation.

Table 1. Setonix Node Overview

Type

N. Nodes

CPU

Cores Per Node

RAM Per Node

GPUs Per Node

Type

N. Nodes

CPU

Cores Per Node

RAM Per Node

GPUs Per Node

Login

9

AMD Milan

2x 64

256GB

n/a

CPU computing

1592

AMD Milan (2.45GHz, 280W)

2x 64

256GB

n/a

CPU high memory

8

AMD Milan (2.45GHz, 280W)

2x 64

1TB

n/a

GPU computing

154

AMD Trento

1 x 64

256GB

8 GCDs (from 4x "AMD MI250X" cards, each card with 2 GCDs)

GPU high memory

38

AMD Trento

1 x 64

512GB

8 GCDs (from 4x "AMD MI250X" cards, each card with 2 GCDs)

GH200

4

Grace

4x72

857GB

4 GPUs (from 4x “NVIDIA H100” from the GH200 superchips)

Data movement

11

AMD 7502P

1x 32

128Gb

n/a

More details regarding the hardware architecture and filesystems are made available in the sections below.

Important: GPU node versus GH200

Although there are two types of nodes with GPUs, the AMD MI250x GPU nodes, and the NVIDIA GraceHopper GH200 node, we refer to the AMD GPUs when discussing GPUs and will explicitly use GH200 when referring to the latter.  

Important: GH200 access

The GH200 nodes are part of the Setonix-Q merit allocation scheme which offers access to Quantum Computers (QPUs) and these GH200 nodes. For more information about quantum resources, please refer to .  

Hardware architecture

Login and management nodes are placed within air-cooled cabinets. Compute nodes are hosted in liquid-cooled cabinets instead. Each compute cabinet is made of eight chassis, containing eight custom compute blades each. Each compute cabinet also hosts up to 64 Slingshot switches, each having in turn 64 200Gbps ports. Compute blades and network switches are connected orthogonally. All Setonix nodes are connected using the dragonfly topology.

Figure 1. Representation of a chassis in a compute cabinets, showing how switches, compute blades, node cards, and nodes relate to each other.

Each compute blade has two independent node cards, each of which hosts two compute nodes

AMD Zen3 CPU architecture

A CPU compute node has 2 AMD Milan EPYC CPUs with 64 cores each and 256GB of RAM. The 64 cores of a Zen3 AMD CPU (shown in Figure 2 below) are evenly distributed across eight Core Chiplet Dies (CCD), each of which has 32Mb of L3 cache shared among all the cores on that CCD (shown in Figure 3 below). There is no limitation on the use of the L3 cache by a single Zen3 core, that can use up all of it. The Zen3 CPU is composed of 8 such CCDs, all connected to an additional memory and I/O controller die through the AMD Infinity Fabric. There are 8 memory channels, each with up to RAM circuits (DIMMS). The CPU supports 128 lanes of PCIe gen4 and up to 32 SATA or NVMe direct connect devices. Every two CCDs form a NUMA region. For more information about NUMA regions check the output of the lstopo-no-gui program.

 

 

GPU node architecture

Each GPU compute node has one AMD Trento EPYC CPU with 64 cores and 256GB of RAM. The Trento CPU architecture is similar to the Milan CPUs in the CPU nodes, with additional support for Infinity Fabric links to the four AMD MI250X GPU cards. Each MI250X card has two "logical GPUs" for a total of 8 GPUs per node. The node architecture of the Setonix GPU nodes is pictured in Figure 4 below. Each L3 cache region is connected to a logical GPU in the MI250X GPU cards via Infinity Fabric connections. These GPUs are also closely inter-connected via numerous Infinity Fabric links, and also connect to the Slingshot NIC cards for data transfer between nodes.  

Note that each MI250X has two Graphics Compute Dies (GCD) that are accessible as two logical GPUs, for a total of eight per node.

Important: GCD vs GPU

A MI250x GPU card has two GCDs. Previous generations of GPUs only had 1 GCD per GPU card, so these terms could be used interchangeably. The interchangeable usage continues even though now GPUs have more than one GCD. Slurm for instance only use the GPU terminology when referring to accelerator resources, so requests such as gpu-per-node is equivalent to a request for a certain number of GCDs per node. On Setonix, the max number is 8. 

The GCD architecture is shown in Figure 5 below, and consists of 110 Compute Units (CU) (for 220 per MI250X, or 880 per node) with 64GB of GPU memory (for 128GB per MI250X, or 512GB per node).

Each Compute Unit contains 64 Stream Processors and 4 Matrix Cores, as shown below in Figure 6.

For more detail regarding the MI250X GPU architecture, refer to the AMD CDNA 2 Architecture Whitepaper.

NVIDIA GH200 node architecture

Each GH200 compute node has four NVIDIA GraceHopper GH200 superchips. Each GH200 superchip has a Grace Arm Neoverse V2 CPU with 72 cores with 120 GB of LPDDR and a Hopper H100 GPU with 96 GB of memory. Due to the unified memory architecture of the superchip, each node will report 4x120 + 4x96 of RAM. The available memory is 857GB. The Grace CPU architecture is completely different from all other nodes, which are x86_64. These nodes are connected to the Slingshot by two NIC cards for data transfer between nodes. These nodes are available on the quantum partition associated with the Setonix-Q Pilot merit allocation scheme.

Important: Grace Arm vs x86

A GH200 node has a Grace Arm CPU, which is a completely different architecture to the rest of the CPUs on Setonix. Software compiled with other cpus will not run on the GH200 nodes. Thus for software to run on GH200 nodes, it must be compiled on a GH200 node. 

 

Figure 7. GH200 Node architecture

For more details regarding the GH200 superchip architecture, refer to https://www.nvidia.com/en-au/data-center/grace-hopper-superchip/ .

Filesystems and data management

  • The /home filesystem, where the user can save personal configurations files;

  • The /software filesystem, hosting the Pawsey-provided software stack, and where users can install software;

  • The /scratch filesystem, high-performance, parallel filesystem to be used for I/O operations within jobs.

Lustre filesystems are connected to compute nodes through the Slingshot fabric.

Because /scratch is a temporary storage solution, Pawsey provides users with the Acacia storage system to store data for the lifetime of their projects. It is based on the object storage paradigm, as opposed to a file storage system, and users transfer data to and from Acacia using a dedicated command-line tool. Check Pawsey Object Storage: Acacia for more information.

Available filesystems on Setonix are summarised in Table 2.

Table 2. Important filesystems mounted on Setonix

Mount point

Variable

Type

Size

Description

Mount point

Variable

Type

Size

Description

/scratch

MYSCRATCH

Lustre filesystem

14.4PB

A high-performance parallel filesystem for data processing.

/software

MYSOFTWARE

Lustre filesystem

393TB

Where system and user software are installed.

/home

HOME

NFS

92TB

Storage relatively small numbers of important system files such as your Linux profile and shell configuration.

/astro

 

 

2.8PB

Filesystem dedicated to astronomy research.