...
Popular numerical routines and functions have been implemented by AMD to run on their GPU hardware. All of the following are available when loading the rocm/5.0.2
module modules.
...
The default ROCm installation is rocm/5.2.3
provided by HPE Cray. In addition, Pawsey staff have installed the more recent versions up to rocm/5.47.3
from source using ROCm-from-source. It is an experimental installation and users might encounter compilation or linking errors. You are encouraged to explore it during development and to report any issues. For production jobs, however, we currently recommend using rocm/5.2.3
.. We recommend the use of the latest available version unless it creates troubles in your code. Available versions can be checked with the command:
module avail rocm
.
Submitting Jobs
You can submit GPU jobs to the gpu
, gpu-dev
and gpu-highmem
Slurm partitions using your GPU allocation.
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Example 1 : One process with a single GPU using shared node access |
---|
linenumbers | true |
---|
| #!/bin/bash --login
#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --nodes=1 #1 nodes in this example
#SBATCH --gres=gpu:1 #1 GPU per node (1 "allocation-pack" in total for the job)
#SBATCH --time=00:05:00
#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list
#----
#MPI & OpenMP settings
export OMP_NUM_THREADS=1 #This controls the real number of threads per task
#----
#Execution
srun -N 1 -n 1 -c 8 --gres=gpu:1 ./program |
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Example 2 : Single CPU process that use the eight GPUs of the node |
---|
linenumbers | true |
---|
| #!/bin/bash --login
#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --nodes=1 #1 nodes in this example
#SBATCH --exclusive #All resources of the node are exclusive to this job
# #8 GPUs per node (8 "allocation-packs" in total for the job)
#SBATCH --time=00:05:00
#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list
#----
#MPI & OpenMP settings
export OMP_NUM_THREADS=1 #This controls the real CPU-cores per task for the executable
#----
#Execution
srun -N 1 -n 1 -c 12864 --gres=gpu:8 --gpus-per-task=8 ./program |
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Example 3 : Eight MPI processes each with a single GPU (use exclusive node access) |
---|
linenumbers | true |
---|
| #!/bin/bash --login
#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --nodes=1 #1 nodes in this example
#SBATCH --exclusive #All resources of the node are exclusive to this job
# #8 GPUs per node (8 "allocation packs" in total for the job)
#SBATCH --time=00:05:00
#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list
#----
#MPI & OpenMP settings
export MPICH_GPU_SUPPORT_ENABLED=1 #This allows for GPU-aware MPI communication among GPUs
export OMP_NUM_THREADS=1 #This controls the real number of threads per task
#----
#Execution
srun -N 1 -n 8 -c 8 --gres=gpu:8 --gpus-per-task=1 --gpu-bind=closest ./program
|
Note |
---|
title | Method 1 may fail for some applications. |
---|
| The use of --gpu-bind=closest may not work for all codes. For those codes, "manual" binding may be the only reliable method if they relying OpenMP or OpenACC pragma's for moving data from/to host to/from GPU and attempting to use GPU-to-GPU enabled MPI communication. Some codes, like {{OpenMM}}, also make use of the runtime environment variables and require explicitly setting ROCR_VISIBLE_DEVICES Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Setting visible devices manually |
---|
| export ROCR_VISIBLE_DEVICES=0,1 # selects the first two GCDS on GPU 1. |
|
|
...