Column

Note

title	Work in Progress for Phase-2 Documentation

The content of this section is currently being updated to include material relevant for Phase-2 of Setonix and the use of GPUs.
On the other hand, all the existing material related to Phase-1 and the use of CPU compute nodes can be considered safe, valid and up-to-date.

Excerpt

Setonix is Pawsey's flagship supercomputer based on the HPE Cray EX architecture that was commissioned in 2020 and delivered in two phases over the course of 2022 and 2023.

...

Type	N. Nodes	CPU	Cores Per Node	RAM Per Node	GPUs Per Node
Login	49	AMD Milan	2x 64	256GB	n/a
CPU computing	504 (Phase 1) 1592 (Total)	AMD Milan (2.45GHz, 280W)	2x 64	256GB	n/a
CPU high memory	8	AMD Milan (2.45GHz, 280W)	2x 64	1TB	n/a
GPU computing	154 (Phase 2)	AMD Trento	1 x 64	256GB	8 GCDs (from 4x "AMD MI250X" cards, each card with 2 GCDs)
GPU high memory	38 (Phase 2)	AMD Trento	1 x 64	512GB	8 GCDs (from 4x "AMD MI250X" cards, each card with 2 GCDs)
Data movement	811	AMD 7502P	1x 32	128Gb	n/a

More details regarding the hardware architecture and filesystems are made available in the sections below.

...

A CPU compute node has 2 AMD Milan EPYC CPUs with 64 cores each and 256Gb 256GB of RAM. The 64 cores of a Zen3 AMD CPU (shown in Figure 2 below) are evenly distributed across eight Core Chiplet Dies (CCD), each of which has 32Mb of L3 cache shared among all the cores on that CCD (shown in Figure 3 below). There is no limitation on the use of the L3 cache by a single Zen3 core, that can use up all of it. The Zen3 CPU is composed of 8 such CCDs, all connected to an additional memory and I/O controller die through the AMD Infinity Fabric. There are 8 memory channels, each with up to RAM circuits (DIMMS). The CPU supports 128 lanes of PCIe gen4 and up to 32 SATA or NVMe direct connect devices. Every two CCDs form a NUMA region. For more information about NUMA regions check the output of the lstopo-no-gui program.

...

Each GPU compute node has one AMD Trento EPYC CPU with 64 cores and 256Gb 256GB of RAM. The Trento CPU architecture is similar to the Milan CPUs in the CPU nodes, with additional support for Infinity Fabric links to the four AMD MI250X GPU cards. Each MI250X card has two "logical GPUs" for a total of 8 GPUs per node. The node architecture of the Setonix GPU nodes is pictured in Figure 4 below. Each L3 cache region is connected to a logical GPU in the MI250X GPU cards via Infinity Fabric connections. These GPUs are also closely inter-connected via numerous Infinity Fabric links, and also connect to the Slingshot NIC cards for data transfer between nodes.

Section

Column

Figure 4. GPU node architecture. Note that the GPU's shown here are equivalent to a GCD.

Note that each MI250X has two Global Graphics Compute Die Dies (GCD) that are accessible as two logical GPUs, for a total of eight per node.

Note

title	Important: GCD vs GPU

Anchor

	gcdgpu
	gcdgpu

A MI250x GPU card has two GCDs. Previous generations of GPUs only had 1 GCD per GPU card, so these terms could be used interchangeably. The interchangeable usage continues even though now GPUs have more than one GCD. Slurm for instance only use the GPU terminology when referring to accelerator resources, so requests such as gpu-per-node is equivalent to a request for a certain number of GCDs per node. On Setonix, the max number is 8.

The GCD architecture is shown in Figure 5 below, and consists of 110 compute units Compute Units (CU) (for 220 per MI250X, or 880 per node) with 64GB of GPU memory (for 128GB per MI250X, or 512GB per node).

...

Each Compute Unit contains 64 stream processors Stream Processors and 4 matrix coresMatrix Cores, as shown below in Figure 6.

Section

Column

Image Added

Figure 6. MI250X Compute Unit architecture

For more detail regarding the MI250X GPU architecture, refer to https://www.amd.com/en/technologies/cdna2.

...

the AMD CDNA 2 Architecture Whitepaper.

Anchor

	Filesystem
	Filesystem

...

Mount point	Variable	Type	Size	Description
`/scatchscratch`	`MYSCRATCH`	Lustre filesystem	14.4PB	A high-performance parallel filesystem for data processing.
`/software`	`MYSOFTWARE`	Lustre filesystem	393TB	Where system and user software are installed.
`/home`	`HOME`	NFS	92TB	Storage relatively small numbers of important system files such as your Linux profile and shell configuration.
`/astro`			2.8PB	Filesystem dedicated to astronomy research.

...

Partition Charge Rate ✕ Max(Cores Proportion, Memory Proportion, GPU Proportion) ✕ N. of nodes requested ✕ Job Elapsed Time (Hours).

...

Partition Charge Rate is a constant value associated with each Slurm partition,
Core proportion is the number of CPU cores per node requested divided by the total number of CPU cores per node,
Memory proportion is the amount of memory per node requested divided by the total amount of memory available per node,
GPU proportion is the amount of GPUs requested divided by the total amount of GPUs available per node (remember that for slurm, each GPU is equivalent to a GCD, so each GPU-node has 8 available GPUs to be requested).

For Setonix CPU nodes, the charge rate is 128 SU per node hour, as each CPU node has 128 cores.

For Setonix GPU nodes, the charge rate is 512 SU per node hour, based on the difference in energy consumption between the CPU and GPU node architectures. Since there are fewer GPU nodes than CPU nodes, these GPU nodes are to be used solely for GPU-enabled codes. Thus, resource requests on GPU nodes are slightly different to CPU nodes as all requests are in units of GCDs, with 1 GCD = 1 Slurm GPU. Requests cannot be made based on memory but must be based on the number of GPUs to be used.

Maintenance

Due to the cutting-edge nature of Setonix, regular and frequent updates of the software stack is expected during the first year of Setonix's operation as further optimisations and improvements are made available.

...

Versions Compared

Old Version 21

New Version Current

Key

Maintenance

Page Comparison

Versions Compared

Old Version 21

New Version Current

Key

Maintenance