Using AlphaFold2 on AMD GPUs

Using AlphaFold2 on AMD GPUs

Running AlphaFold2 on Setonix

Overview

This guide explains how to run AlphaFold2 (AF2) on Pawsey's Setonix supercomputer GPU nodes for monomer, monomer_ptm, and multimer. Please note there is currently a GPU memory limitation that restricts computations to proteins of ~3000 amino acids or less. Attempting to process larger proteins will result in out-of-memory errors.

If you do not have a GPU allocation available to you, this software will use CPUs if no GPUs are available. However, it will take longer to complete.

Reference Data Location

AlphaFold2 requires several reference databases. On Setonix, these are located at:

/scratch/references/alphafold_feb2024/databases/

The following databases are available:

  • UniRef90 (uniref90.fasta)

  • MGnify (mgy_clusters_2022_05.fa)

  • PDB70

  • Small BFD

  • PDB mmCIF files

Job Script Template for Monomer/Monomer_ptm Mode

Below is a template SLURM script for running AlphaFold2 in monomer mode. Save this as run_af2_monomer.slurm. Change the present flag to --model_preset=monomer_ptm if you would like to run monomer_ptm mode instead.

#!/bin/bash -l #SBATCH -A ${PAWSEY_PROJECT}-gpu #SBATCH --nodes=1 #SBATCH --partition=gpu-highmem #SBATCH --time=10:00:00 #SBATCH --gres=gpu:2 # Load required module module load singularity/3.11.4-nompi REF_DIR='/scratch/references/alphafold_feb2024/databases' #Set essential XLA flag. Do not remove this flag. export XLA_FLAGS="--xla_gpu_autotune_level=0" # Run AlphaFold2 srun -N 1 -n 1 -c 16 --gres=gpu:2 \ singularity exec docker://quay.io/pawsey/alphafold2-amd-gpu:v2.3.2_rocm6.2.4 \ python /app/alphafold/run_alphafold.py \ --fasta_paths=/path/to/your/sequence.fasta \ --model_preset=monomer \ --use_gpu_relax=True \ --benchmark=False \ --uniref90_database_path=${REF_DIR}/uniref90/uniref90.fasta \ --mgnify_database_path=${REF_DIR}/mgnify/mgy_clusters_2022_05.fa \ --pdb70_database_path=${REF_DIR}/pdb70/pdb70 \ --data_dir=${REF_DIR} \ --template_mmcif_dir=${REF_DIR}/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=${REF_DIR}/pdb_mmcif/obsolete.dat \ --small_bfd_database_path=${REF_DIR}/small_bfd/bfd-first_non_consensus_sequences.fasta \ --output_dir=${MYSCRATCH}/alphafold2/output/${SLURM_JOB_ID} \ --max_template_date=2023-05-14 \ --db_preset=reduced_dbs \ --logtostderr \ --hhsearch_binary_path=/opt/miniforge3/bin/hhsearch \ --hhblits_binary_path=/opt/miniforge3/bin/hhblits

This example script uses two GCDs on the highmem GPU partition to provide enough CPU memory for the Jackhmmr steps.

Job Script Template for Multimer Mode

Below is a template SLURM script for running AlphaFold2 in monomer mode. Save this as run_af2_multimer.slurm.

#!/bin/bash -l #SBATCH -A ${PAWSEY_PROJECT}-gpu #SBATCH --nodes=1 #SBATCH --partition=gpu-highmem #SBATCH --time=10:00:00 #SBATCH --gres=gpu:2 # Load required module module load singularity/3.11.4-nompi REF_DIR='/scratch/references/alphafold_feb2024/databases' #Set essential XLA flag. Do not remove this flag. export XLA_FLAGS="--xla_gpu_autotune_level=0" # Run AlphaFold2 srun -N 1 -n 1 -c 16 --gres=gpu:2 \ singularity exec docker://quay.io/pawsey/alphafold2-amd-gpu:v2.3.2_rocm6.2.4 \ python /app/alphafold/run_alphafold.py \ --fasta_paths=/path/to/your/sequence.fasta \ --model_preset=multimer \ --use_gpu_relax=True \ --benchmark=False \ --uniref90_database_path=${REF_DIR}/uniref90/uniref90.fasta \ --mgnify_database_path=${REF_DIR}/mgnify/mgy_clusters_2022_05.fa \ --data_dir=${REF_DIR} \ --template_mmcif_dir=${REF_DIR}/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=${REF_DIR}/pdb_mmcif/obsolete.dat \ --small_bfd_database_path=${REF_DIR}/small_bfd/bfd-first_non_consensus_sequences.fasta \ --pdb_seqres_database_path=${REF_DIR}/pdb_seqres/pdb_seqres.txt \ --uniprot_database_path=${REF_DIR}/uniprot --output_dir=${MYSCRATCH}/alphafold2_jax/output/${SLURM_JOB_ID} \ --max_template_date=2023-05-14 \ --db_preset=reduced_dbs \ --logtostderr \ --hhsearch_binary_path=/opt/miniforge3/bin/hhsearch \ --hhblits_binary_path=/opt/miniforge3/bin/hhblits

Important Notes and Limitations

  1. Fasta file input: Please note you must change the template script to point to your FASTA file --fasta_paths=/path/to/your/sequence.fasta

  2. Template date: Please note you should change the --max_template_date to suit your analysis.

  3. Output Directory: Remember to modify the --output_dir path to suit your needs if required

  4. Database Preset: The script uses reduced_dbs preset for faster processing. For higher accuracy, you can change to full_dbs but this will increase runtime.

Running Your Job

  1. Submit your job:

    sbatch run_af2_monomer.slurm OR sbatch run_af2_multimer.slurm
  2. Monitor your job:

    squeue -u $USER

Output Files

AlphaFold2 will create a directory with the job ID under your specified output directory. This will contain:

  • Predicted structures in PDB format

  • Confidence scores (pLDDT and PAE)

  • Log files

  • MSA visualization files

Support

For issues or questions, please contact the Pawsey Help Desk at help@pawsey.org.au

Further Reading

For more details on running GPU workflows on Setonix, refer to https://pawsey.atlassian.net/wiki/spaces/US/pages/51928618