GROMACS
GROMACS is a versatile package for performing molecular dynamics, for example, simulating the Newtonian equations of motion for systems with hundreds to millions of particles.
GROMACS is primarily designed for biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions. Because GROMACS is extremely fast at calculating the nonbonded interactions that usually dominate simulations, many groups are also using it for research on non-biological systems, for example, polymers.
Versions installed in Pawsey systems
To check the current installed versions, use the module avail
command (current versions may be different from content shown here):
$ module avail gromacs ------------------------- /software/setonix/2024.05/modules/zen3/gcc/12.2.0/applications -------------------------- gromacs-amd-gfx90a/2023 gromacs/2022.5-mixed gromacs/2023-mixed (D) gromacs/2022.5-double gromacs/2023-double
Modules with the -amd-gfx90a
suffix support GPU offloading and are meant to be used within the gpu
partition. mixed
means mixed precision and double
means double precision installations.
GROMACS is compiled with the GNU programming environment.
All GROMACS installations on Setonix have been patched with Plumed.
Example: Running GROMACS on CPU
This is an example of a GROMACS job queueing script. As an example, we used the benchMEM.tpr
benchmark case that can be found at the following page: A free GROMACS benchmark set (external site).
#!/bin/bash --login #SBATCH --nodes=1 #SBATCH --ntasks=128 #SBATCH --exclusive #SBATCH --time=00:05:00 #SBATCH --account=[your-project] module load gromacs/2023-double export OMP_NUM_THREADS=1 srun -N 1 -n 128 gmx_mpi_d mdrun -s benchMEM.tpr
For more information on how to run jobs on the CPU partitions see: Example Slurm Batch Scripts for Setonix on CPU Compute Nodes.
Running GROMACS on GPUs
GROMACS supports GPU offloading of some of the operations to GPU. Acceleration is officially supported using the SYCL standard. Additionally, AMD staff maintains its own GROMACS GPU implementation using HIP. The AMD HIP-port is often faster than SYCL, but is not officially supported by the GROMACS core developers and lags behind the official release on features. The version currently installed on Setonix, gromacs-amd-gfx90a/2023
, is the AMD port.
GPU offloading can be enabled with the following options:
-pme
gpu: compute long-range interactions on GPU,-pme 1
: dedicate a single MPI task to PME calculations, which can improve performance when running with two GCDs. Note that the module on Setonix does not support more than onepme
rank when offloading to GPUs; this can negatively impact performance of multi-gpu calculations with more than two GCDs due to load-imbalances betweenPP
andPME
components of the calculation.-nb gpu
: compute non bonded interactions on GPUs.-bonded gpu
: compute bonded interactions on GPUs. This option is not always available, GROMACS will print an error message when the operation cannot be performed on GPU.-update gpu
: compute constraints and update on GPUs. This option is not always available, GROMACS will print an error message when the operation cannot be performed on GPU.
More information can be found in the following page in the GROMACS documentation: Running mdrun with GPUs (external site).
As an example, we used the benchMEM.tpr
benchmark case that can be found at the following page: A free GROMACS benchmark set (external site). Here is a very simple submission script.
#!/bin/bash #SBATCH --nodes=1 #SBATCH --gres=gpu:1 #SBATCH --partition=gpu #SBATCH --account=[your-project]-gpu module load craype-accel-amd-gfx90a module load gromacs-amd-gfx90a/2023 srun gmx_mpi mdrun -nb gpu -bonded gpu -ntomp 8 -s benchMEM.tpr
For more information on how to run jobs on the GPU partitions see Example Slurm Batch Scripts for Setonix on GPU Compute Nodes.
Multi-GPU calculations
Multi-GPU calculations are supported by GROMACS on Setonix. Running with one MPI rank per GPU typically gives good performance. Running with one MPI task and multiple OpenMP threads per GCD usually gives good performance. GROMACS is able to take advantage of the srun
options --gpus-per-task=1
and --gpu-bind=closest
, which will ensure optimal binding of GCDs to their directly connected chiplet on the CPU, as described in Example Slurm Batch Scripts for Setonix on GPU Compute Nodes.
There are some general limitations to the multi-GPU performance:
- GROMACS's hardware report (printed in the log file) has a known limitation where it only reports a single GCD as being "visible" to the main MPI rank, even though the calculation will actually be utilising multiple GCDs. This means that the log file will report "
1 GPU selected for this run
", even though it is actually utilising all the GCDs requested in the job script. - The
gromacs-amd-gfx90a/2023
module on Setonix is limited to one dedicated PME rank when offloading PME to the GPU. This means that the time taken by the PME component of the simulation will stay constant as more GPUs are added, degrading performance when running with more than 2 GCDs. GROMACS will print a warning to the log file if the PP–PME workload imbalance becomes significant. - GROMACS's dynamic load-balancing has limited support for fully-GPU-resident calculations. GROMACS will print a warning to the log file if there is a substantial workload imbalance between MPI ranks, which can be a good starting point when configuring future calculations.
While it is possible to use multiple GPUs in parallel, the above limitations mean there may be limited benefit to running with more than two GCDs.
Here is a sample multi-GPU submission script for the benchMem.tpr
benchmark, using two GCDs and 16 CPU cores. One MPI rank is dedicated to the PME component and the other calculates the particle-particle (PP) and non-bonded (NB) interactions, and updates the particle positions:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --gres=gpu:2 #SBATCH --partition=gpu #SBATCH --account=[your-project]-gpu module load craype-accel-amd-gfx90a module load gromacs-amd-gfx90a/2023 export MPICH_GPU_SUPPORT_ENABLED=1 #This allows for GPU-aware MPI communication among GPUs srun --gpus-per-task=1 --gpu-bind=closest gmx_mpi mdrun -nb gpu -bonded gpu -ntomp 8 -s benchMEM.tpr
For more information on how to run jobs on the GPU partitions see Example Slurm Batch Scripts for Setonix on GPU Compute Nodes.