Column

Note

title	Work in Progress for Phase-2 Documentation

The content of this section is currently being updated to include material relevant for Phase-2 of Setonix and the use of GPUs.
On the other hand, all the existing material related to Phase-1 and the use of CPU compute nodes can be considered safe, valid and up-to-date.

Excerpt

Setonix is Pawsey's flagship supercomputer based on the HPE Cray EX architecture that was commissioned in 2020 and delivered in two phases over the course of 2022 and 2023.

...

Type	N. Nodes	CPU	Cores Per Node	RAM Per Node	GPUs Per Node
Login	49	AMD Milan	2x 64	256GB	n/a
CPU computing	1592	AMD Milan (2.45GHz, 280W)	2x 64	256GB	n/a
CPU high memory	8	AMD Milan (2.45GHz, 280W)	2x 64	1TB	n/a
GPU computing	154	AMD Trento	1 x 64	256GB	8 GCDs (from 4x "AMD MI250X" cards, each card with 2 GCDs)
GPU high memory	38	AMD Trento	1 x 64	512GB	8 GCDs (from 4x "AMD MI250X" cards, each card with 2 GCDs)
Data movement	811	AMD 7502P	1x 32	128Gb	n/a

More details regarding the hardware architecture and filesystems are made available in the sections below.

...

Each GPU compute node has one AMD Trento EPYC CPU with 64 cores and 256GB of RAM. The Trento CPU architecture is similar to the Milan CPUs in the CPU nodes, with additional support for Infinity Fabric links to the four AMD MI250X GPU cards. Each MI250X card has two "logical GPUs" for a total of 8 GPUs per node. The node architecture of the Setonix GPU nodes is pictured in Figure 4 below. Each L3 cache region is connected to a logical GPU in the MI250X GPU cards via Infinity Fabric connections. These GPUs are also closely inter-connected via numerous Infinity Fabric links, and also connect to the Slingshot NIC cards for data transfer between nodes.

Section

Column

Figure 4. GPU node architecture. Note that the GPU's shown here are equivalent to a GCD.

Note that each MI250X has two Global Graphics Compute Die Dies (GCD) that are accessible as two logical GPUs, for a total of eight per node.

Note

title	Important: GCD vs GPU

Anchor

	gcdgpu
	gcdgpu

A MI250x GPU card has two GCDs. Previous generations of GPUs only had 1 GCD per GPU card, so these terms could be used interchangeably. The interchangeable usage continues even though now GPUs have more than one GCD. Slurm for instance only use the GPU terminology when referring to accelerator resources, so requests such as gpu-per-node is equivalent to a request for a certain number of GCDs per node. On Setonix, the max number is 8.

The GCD architecture is shown in Figure 5 below, and consists of 110 compute units Compute Units (CU) (for 220 per MI250X, or 880 per node) with 64GB of GPU memory (for 128GB per MI250X, or 512GB per node).

...

Each Compute Unit contains 64 stream processors Stream Processors and 4 matrix coresMatrix Cores, as shown below in Figure 6.

...

For more detail regarding the MI250X GPU architecture, refer to https://www.amd.com/en/technologies/cdna2.the AMD CDNA 2 Architecture Whitepaper.

Anchor

	Filesystem
	Filesystem

...

Mount point	Variable	Type	Size	Description
`/scatchscratch`	`MYSCRATCH`	Lustre filesystem	14.4PB	A high-performance parallel filesystem for data processing.
`/software`	`MYSOFTWARE`	Lustre filesystem	393TB	Where system and user software are installed.
`/home`	`HOME`	NFS	92TB	Storage relatively small numbers of important system files such as your Linux profile and shell configuration.
`/astro`			2.8PB	Filesystem dedicated to astronomy research.

...

Partition Charge Rate ✕ Max(Cores Proportion, Memory Proportion, GPU Proportion) ✕ N. of nodes requested ✕ Job Elapsed Time (Hours).

...

Partition Charge Rate is a constant value associated with each Slurm partition,
Core proportion is the number of CPU cores per node requested divided by the total number of CPU cores per node,
Memory proportion is the amount of memory per node requested divided by the total amount of memory available per node,
GPU proportion is the amount of GPUs requested divided by the total amount of GPUs available per node (remember that for slurm, each GPU is equivalent to a GCD, so each GPU-node has 8 available GPUs to be requested).

For Setonix CPU nodes, the charge rate is 128 SU per node hour, as each CPU node has 128 cores.

For Setonix GPU nodes, the charge rate is 512 SU per node hour, based on the difference in energy consumption between the CPU and GPU node architectures. Since there are fewer GPU nodes than CPU nodes, these GPU nodes are to be used solely for GPU-enabled codes. Thus, resource requests on GPU nodes are slightly different to CPU nodes as all requests are in units of GCDs, with 1 GCD = 1 Slurm GPU. Requests cannot be made based on memory but must be based on the number of GPUs to be used.

Maintenance

Due to the cutting-edge nature of Setonix, regular and frequent updates of the software stack is expected during the first year of Setonix's operation as further optimisations and improvements are made available.

...

Versions Compared

Old Version 25

New Version Current

Key

Maintenance

Page Comparison

Versions Compared

Old Version 25

New Version Current

Key

Maintenance