Setonix GPU Partition Quick Start
This page summarises the information needed to start using the Setonix GPU partitions.
On this page: |
|---|
|
Overview
The GPU partition of Setonix is made up of 192 nodes, 38 of which are high memory nodes (512 GB RAM instead of 256GB). Each GPU node features 4 AMD MI250X GPUs, as depicted in Figure 1. Each MI250X comprises 2 Graphics Complex Die (GCD), with each effectively seen as a standalone GPU by the system. A 64-core AMD Trento CPU is connected to the four MI250X with the AMD InfinityFabric interconnect, the same interconnection between the GPU cards, with a peak bandwidth of 200Gb/s. For more information refer to the Setonix General Information. Each GCD can access 64GB of GPU memory. This totals to 128GB per MI250X, and 256GB per standard GPU node. Each Setonix GPU node has an attached NVMe device with 3575GB usable space.
Figure 1. A GPU node of Setonix
Supported Applications
Several scientific applications are already able to offload computations to the MI250X, many others are in the process of being ported to AMD GPUs. Here is a list of the main ones and their current status.
Name | AMD GPU Acceleration | Module on Setonix |
|---|---|---|
Amber | Yes | Yes |
Gromacs | Yes | Yes |
LAMMPS | Yes | Yes |
NAMD | Yes |
|
NekRS | Yes |
|
PyTorch | Yes | Yes* |
ROMS | No |
|
Tensorflow | Yes | Yes* |
Table 1. List of popular applications. * indicates module is a container as module.
Module names of AMD GPU applications end with the postfix amd-gfx90a. The most accurate list is given by the module command:
$ module avail gfx90a
Tensorflow
Tensorflow is available as container at the following location,
/software/setonix/2022.11/containers/sif/amdih/tensorflow/rocm5.0-tf2.7-dev/tensorflow-rocm5.0-tf2.7-dev.sif
but no module has been created for it yet.
Supported Numerical Libraries
Popular numerical routines and functions have been implemented by AMD to run on their GPU hardware. All of the following are available when loading the rocm modules.
Name | Description |
|---|---|
rocFFT | Fast Fourier Transform. Documentation pages (external site). |
rocBLAS | rocBLAS is the AMD library for Basic Linear Algebra Subprograms (BLAS) on the ROCm platform. Documentation pages (external site). |
rocSOLVER | rocSOLVER is a work-in-progress implementation of a subset of LAPACK functionality on the ROCm platform. Documentation pages (external site). |
Table 2. Popular GPU numerical libraries.
Each of the above libraries has an equivalent HIP wrapper that enables compilation on both ROCm and NVIDIA platforms.
A complete list of available libraries can be found on this page (external site).
AMD ROCm installations
The default ROCm installation is rocm/5.2.3 provided by HPE Cray. In addition, Pawsey staff have installed the more recent versions up to rocm/5.7.3. We recommend the use of the latest available version unless it creates troubles in your code. Available versions can be checked with the command:
module avail rocm.
Submitting Jobs
You can submit GPU jobs to the gpu, gpu-dev and gpu-highmem Slurm partitions using your GPU allocation.
Note that you will need to use a different project code for the --account/-A option. More specifically, it is your project code followed by the -gpu suffix. For instance, if your project code is project1234, then you will have to use project1234-gpu.
Unlike the CPU partitions, the /tmp and /var/tmp directories access NVMe storage, not the tmpfs RAM disk. (The tmpfs filesystem is still available under /dev/shm.) Request a specific amount of NVMe storage in your job script by adding tmp:<some-value>G to the --gres option. Unless using all of the GCDs on the node, please limit your NVMe request to no more than 2679 GiB. You should not be able to use more NVMe space than what has been allocated to you. By default, without any explicit NVMe request, a job should get allocated 128 GiB of the NVMe device. The NVMe device (or the portion used by a job) is cleaned up after the job completes. IMPORTANT: Migrate any valuable results from the NVMe device before the job completes.
An extensive explanation on the use of the GPU nodes (including request by "allocation packs" and the "manual" binding) is in Example Slurm Batch Scripts for Setonix on GPU Compute Nodes.
Compiling software
If you are using ROCm libraries, such as rocFFT, to offload computations to GPUs, you should be able to use any compiler to link those to your code.
For HIP code use hipcc. And, for code making use of OpenMP offloading, you must use:
hipccfor c/c++ftn(wrapper for cray-fortran from PrgEnv-cray) for fortran. This compiler also allows GPU offloading with OpenACC.
When using hipcc, note that the location of the MPI headers and libraries are not automatically included (contrary to the automatic inclusion when using the Cray wrapper scripts). Therefore, if your code also requires MPI, the location of the MPI headers and libraries must be provided to hipcc as well as the GPU Transport Layer libraries:
MPI include and library flags for hipcc
-I${MPICH_DIR}/include
-L${MPICH_DIR}/lib -lmpi
-L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa
Also to ensure proper use of GPU-GPU MPI communication codes must be compiled and run with the following environment variable set:
MPI environment variable for GPU-GPU communication
export MPICH_GPU_SUPPORT_ENABLED=1
Accounting
Each MI250X GCD, which corresponds to a Slurm GPU, is charged 64 SU per hour. This means the use of an entire GPU node is charged 512 SU per hour. In general, a job is charged the largest proportion of core, memory, or GPU usage rounded up to 1/8ths of a node (corresponding to an individual MI250X GCD). Note that GPU node usage is accounted against GPU allocations with the -gpu suffix, which are separate to CPU allocations.