Using ESMFold on AMD GPUs

Using ESMFold on AMD GPUs

Overview

This guide explains how to run ESMFold batch jobs on Pawsey's Setonix supercomputer. ESMFold can only utilise the memory of a single GCD, which means there is an upper limit on amino acid size that it can process before running out of memory, around 2000aa. Changing the --chunk-size parameter may enable better memory performance.

Prerequisites

  • A Pawsey account with GPU allocation

  • Basic familiarity with SLURM job submission

Job Script Template

Below is a template SLURM script for running ESMFold. Save this as run_esm.slurm:

#!/bin/bash -l #SBATCH --account=${PAWSEYPROJECT}-gpu #SBATCH --partition=gpu #SBATCH --nodes=1 #SBATCH --gres=gpu:1 #SBATCH --time=4:00:00 #SBATCH --job-name=esmfold # Load required module module load singularity/3.11.4-nompi # Set input INPUT=path/to/your/input # Create results directory RESULTS_DIR='$MYSCRATCH/esmfold/${SLURM_JOB_ID}/results/' mkdir -p $RESULTS_DIR # Set reference directory REF_DIR='/scratch/references/alphafold_feb2024/databases' # Set container CONTAINER=docker://quay.io/pawsey/esmfold_openfold:rocm6.3.3 # Create cache directory CACHE_DIR='${MYSCRATCH}/esmfold/esmfold_cache' export TRANSFORMERS_CACHE='${CACHE_DIR}' export HF_HOME='${CACHE_DIR}' export TORCH_HOME='${CACHE_DIR}/torch' mkdir -p $TORCH_HOME # Run the container srun -N 1 -n 1 -c 8 --gres=gpu:1 \ singularity exec \ -B ${CACHE_DIR} \ $CONTAINER \ esm-fold -i $INPUT -o ${RESULTS_DIR} --chunk-size 16

Key Parameters and Settings

  1. Resource Allocation:

    • Uses the gpu partition

    • Requests 1 GPU

    • Default runtime is 1 hour

    • Uses 8 CPU cores

  2. Environment Variables:

    CACHE_DIR='${MYSCRATCH}/esmfold/esmfold_cache' export TRANSFORMERS_CACHE='${CACHE_DIR}' export HF_HOME='${CACHE_DIR}' export TORCH_HOME='${CACHE_DIR}/torch'

    ESMFold expects various cache directories to be available. We set these to a convenient location on Setonix.

Before Running

  1. Modify the input path:

    INPUT=/path/to/your/input
  2. Optional: Adjust the output directory:

    OUT=$MYSCRATCH/esmfold/${SLURM_JOB_ID}
  3. Replace ${PAWSEYPROJECT} with your project code.

Running Your Job

  1. Submit your job:

    sbatch run_esmfold.slurm
  2. Monitor your job:

    squeue -u $USER

Further Reading

For more details on running GPU workflows on Setonix, refer to Setonix GPU Partition Quick Start