...
Note that you will need to use a different project code for the --account
/-A option. More specifically, it is your project code followed by the -gpu
suffix. For instance, if your project code is project1234
, then you will have to use project1234-gpu
.GPUs must be explicitly requested to Slurm using the --gres=gpu:<num_gpus>
, --gpus-per-task=<num_gpus>
or --gpus-per-node=<num_gpus>
options. The --gpus-per-node
option is recommended.
Insert excerpt |
---|
| Example Slurm Batch Scripts for Setonix on GPU Compute Nodes |
---|
| Example Slurm Batch Scripts for Setonix on GPU Compute Nodes |
---|
|
Compiling software
If you are using ROCm libraries, such as rocFFT, to offload computations to GPUs, you should be able to use any compiler to link those to your code.
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Example 1 : One process with a single GPU using shared node access |
---|
linenumbers | true |
---|
| #!/bin/bash --login
#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --ntasksnodes=1 #1 nodes in this example
#SBATCH --ntasksgpus-per-node=1 #SBATCH --cpus-per-task=8
#SBATCH --sockets-per-node=1
#SBATCH --gpus-per-node=1 #1 GPUs per node (1 "allocation packs" in total for the job)
#SBATCH --time=00:05:00
#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list
#----
#MPI & OpenMP settings
export OMP_NUM_THREADS=1 #This controls the real number of threads per task
#----
#Execution
srun -N 1 -n 1 -c 8 --gpus-per-node=1 ./program |
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Example 2 : One process with a Single CPU process that use the eight GPUs using exclusive of the node access |
---|
linenumbers | true |
---|
| #!/bin/bash --login
#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --ntasksnodes=1 #1 nodes in this example
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --sockets-per-node=8
#SBATCH --gpus-per-node=8exclusive #All resources of the node are exclusive to this job
# #8 GPUs per node (8 "allocation packs" in total for the job)
#SBATCH --time=00:05:00
#SBATCH
--exclusive
#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list
#----
#MPI & OpenMP settings
export OMP_NUM_THREADS=1 #This controls the real CPU-cores numberper oftask threadsfor perthe taskexecutable
#----
#Execution
srun -N 1 -n 1 -c 64 --gpus-per-node=8 --gpus-per-task=8 ./program |
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Example 3 : Eight MPI processes each with a single GPU using (use exclusive node access) |
---|
linenumbers | true |
---|
| #!/bin/bash --login
#SBATCH --account=project-gpu
#SBATCH --partition=gpu
#SBATCH --ntasksnodes=81 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=8 #SBATCH --sockets-per-node=8 #SBATCH --gpus-per-node=8 #SBATCH --time=00:05:00 #SBATCH --exclusive #----
#Loading needed modules (adapt#1 nodes in this forexample
your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list
#----
#MPI & OpenMP settings
export OMP_NUM_THREADS=1 #This controls the real number of threads per task
#----
#First preliminar "hack": create a selectGPU wrapper to be used for
##SBATCH --exclusive #All resources of the node are exclusive to this job
# #8 GPUs per node (8 "allocation packs" in bindingtotal onlyfor 1the GPUjob)
per each task spawned by srun
wrapper="selectGPU_${SLURM_JOBID}.sh"
cat << EOF > $wrapper
#!/bin/bash
export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID
exec \$*
EOF
chmod +x ./$wrapper#SBATCH --time=00:05:00
#----
#Loading needed modules (adapt this for your own purposes):
module load PrgEnv-cray
module load rocm craype-accel-amd-gfx90a
module list
#----
#Second#MPI preliminar "hack": generate an ordered list of CPU-cores (each on a different slurm-socket)
# & OpenMP settings
export MPICH_GPU_SUPPORT_ENABLED=1 #This allows for GPU-aware MPI communication among GPUs
export OMP_NUM_THREADS=1 #This controls the real number of threads per task
to be matched with the correct GPU in the srun command using --cpu-bind option.
CPU_BIND="map_cpu:48,56,16,24,0,8,32,40"
#----
#Execution
srun -c 8 --cpu-bind=${CPU_BIND} ./$wrapper ./program
#----
#Deleting the wrapper
rm -f ./$wrapper#----
#Execution
srun -N 1 -n 8 -c 8 --gpus-per-node=8 --gpus-per-task=1 --gpu-bind=closest ./program
|
Note |
---|
title | Method 1 may fail for some applications. |
---|
| The use of --gpu-bind=closest may not work for all codes. For those codes, "manual" binding may be the only reliable method if they relying OpenMP or OpenACC pragma's for moving data from/to host to/from GPU and attempting to use GPU-to-GPU enabled MPI communication. |
|