...

Popular numerical routines and functions have been implemented by AMD to run on their GPU hardware. All of the following are available when loading the rocm/5.0.2 module modules.

Name	Description
rocFFT	Fast Fourier Transform. Documentation pages (external site).
rocBLAS	rocBLAS is the AMD library for Basic Linear Algebra Subprograms (BLAS) on the ROCm platform. Documentatio n pages (external site).
rocSOLVER	rocSOLVER is a work-in-progress implementation of a subset of LAPACK functionality on the ROCm platform. Documentation pages (external site).

...

The default ROCm installation is rocm/5.2.3 provided by HPE Cray. In addition, Pawsey staff have installed the more recent versions up to rocm/5.47.3 from source using ROCm-from-source. It is an experimental installation and users might encounter compilation or linking errors. You are encouraged to explore it during development and to report any issues. For production jobs, however, we currently recommend using rocm/5.2.3.. We recommend the use of the latest available version unless it creates troubles in your code. Available versions can be checked with the command:

module avail rocm.

Submitting Jobs

You can submit GPU jobs to the gpu, gpu-dev and gpu-highmem Slurm partitions using your GPU allocation.

...

Column

width	900px

Code Block

language	bash
theme	Emacs
title	Example 1 : One process with a single GPU using shared node access
linenumbers	true

#!/bin/bash --login

#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --nodes=1              #1 nodes in this example
#SBATCH --gres=gpu:1           #1 GPU per node (1 "allocation-pack" in total for the job)
#SBATCH --time=00:05:00

#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list

#----
#MPI & OpenMP settings
export OMP_NUM_THREADS=1 #This controls the real number of threads per task

#----
#Execution
srun -N 1 -n 1 -c 8 --gres=gpu:1 ./program

Code Block

language	bash
theme	Emacs
title	Example 2 : Single CPU process that use the eight GPUs of the node
linenumbers	true

#!/bin/bash --login

#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --nodes=1              #1 nodes in this example
#SBATCH --exclusive            #All resources of the node are exclusive to this job
#                              #8 GPUs per node (8 "allocation-packs" in total for the job)
#SBATCH --time=00:05:00

#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list

#----
#MPI & OpenMP settings
export OMP_NUM_THREADS=1           #This controls the real CPU-cores per task for the executable

#----
#Execution
srun -N 1 -n 1 -c 12864 --gres=gpu:8 --gpus-per-task=8 ./program

Code Block

language	bash
theme	Emacs
title	Example 3 : Eight MPI processes each with a single GPU (use exclusive node access)
linenumbers	true

#!/bin/bash --login

#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --nodes=1              #1 nodes in this example
#SBATCH --exclusive            #All resources of the node are exclusive to this job
#                              #8 GPUs per node (8 "allocation packs" in total for the job)
#SBATCH --time=00:05:00

#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list

#----
#MPI & OpenMP settings
export MPICH_GPU_SUPPORT_ENABLED=1 #This allows for GPU-aware MPI communication among GPUs
export OMP_NUM_THREADS=1           #This controls the real number of threads per task

#----
#Execution
srun -N 1 -n 8 -c 8 --gres=gpu:8 --gpus-per-task=1 --gpu-bind=closest ./program

Note

title	Method 1 may fail for some applications.

The use of --gpu-bind=closest may not work for all codes. For those codes, "manual" binding may be the only reliable method if they relying OpenMP or OpenACC pragma's for moving data from/to host to/from GPU and attempting to use GPU-to-GPU enabled MPI communication.

Some codes, like {{OpenMM}}, also make use of the runtime environment variables and require explicitly setting ROCR_VISIBLE_DEVICES

Code Block

language	bash
theme	Emacs
title	Setting visible devices manually

export ROCR_VISIBLE_DEVICES=0,1 # selects the first two GCDS on GPU 1.

...

Version	Old Version 56	New Version 58
Changes made by	Sarah Beecroft	Alexis Espinosa
Saved on	Jan 29, 2024	Jul 02, 2024

Versions Compared

Key

Submitting Jobs

Content Comparison

Versions Compared

Key

Submitting Jobs