Skip to end of banner
Go to start of banner

Setonix GPU Partition Quick Start

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 16 Next »

Check this page regularly as it will be updated frequently over the incoming months as the deployment of the software progresses.

All you need to know to get started on Setonix Phase 2 - GPU partition.

On this page:

Overview

The GPU partition of Setonix is made up of  192 nodes, 38 of which are high memory nodes (512 GB RAM instead of 256GB). Each GPU node features 4 AMD MI250X GPUs, as depicted in figure 1. Each MI250X comprises 2 Graphics Complex Die (GCD), each effectively seen as a standalone GPU by the system. A 64-core AMD Trento CPU is connected to the four MI250X with the AMD InfinityFabric interconnect, the same interconnection between the GPU cards, with a peak bandwidth of 200Gb/s—more information at Setonix General Information.

Setonix-GPU-Node.png

Figure 1. A GPU node of Setonix

Supported Applications

Several scientific applications are already able to offload computations to the MI250X, many others are in the process of being ported to AMD GPUs. Here is a list of the main ones and their current status.

NameSupport statusModule on Setonix
AmberSupported
GromacsSupported
LAMMPSSupported
NAMDSupported
NekRSSupported
PyTorchSupported
ROMSNot supported
TensorflowSupported
VASPPorting in progress, no ETA

Table 1. List of popular applications 

Module names of GPU applications end with the postfix amd-mi250x. The most accurate list is given by the module  command:

$ module avail mi-250x 

Supported Numerical Libraries

Popular numerical routines and functions have been implemented by AMD to run on their GPU hardware. All of the following are available when loading the rocm/5.0.2  module.

NameDescription
rocFFTFast Furier Transform. Documentation pages (external site).
rocBLASrocBLAS is the AMD library for Basic Linear Algebra Subprograms (BLAS) on the ROCm platform. Documentation pages (external site).
rocSOLVERrocSOLVER is a work-in-progress implementation of a subset of LAPACK functionality on the ROCm platform. Documentation pages (external site).

Table 2. Popular GPU numerical libraries.

Each of the above libraries has an equivalent HIP wrapper that enables compilation on both ROCm and NVIDIA platforms.

A complete list of available libraries can be found on this page (external site).

AMD ROCm installations

The main ROCm installation is rocm/5.0.2  provided by HPE Cray. In addition Pawsey staff installed rocm/5.4.3  from source using ROCm-from-source. It is an experimental installation and users might encounter compilation or linking errors. You are encouraged to explore it during development and to report any issues. For production jobs, however, we currently recommend sticking to rocm/5.0.2.

Submitting Jobs

You can submit GPU jobs to the gpu and gpu-highmem Slurm partitions.

Note that you will need to use a different project code for the --account/-A option. More specifically, it is your project code followed by the -gpu suffix. For instance, if your project code is project1234, then you will have to use project1234-gpu.

GPUs must be explicitly requested to Slurm using the --gres=gpu:<num_gpus>--gpus-per-task=<num_gpus> or --gpu-per-node=<num_gpus> options.

Accounting

Each MI250X GCD, which corresponds to a Slurm GPU, is charged 64SU per hour. This means the use of an entire GPU node is charged 512SU per hour. In general, a job is charged the largest proportion of core, memory, or GPU usage rounded up to 1/8ths of a node (corresponding to an individual MI250X GCD).

Programming AMD GPUs

You can program AMD MI250X GPUs using HIP, which is the programming framework equivalent to the one of NVIDIA, CUDA. The HIP platform is available after having loaded the rocm  module.

Links to recorded training sessions will be added here.


The complete AMD documentation on how to program with HIP can be found here (external site).

Uptake Projects

Uptake projects are collaborations between researchers and Pawsey staff to optimize scientific code for Setonix and advance research, faster. Contact the helpdesk for more information: help@pawsey.org.au.


  • No labels