Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt

Machine Learning workloads are supported on Setonix through a custom TensorFlow container developed by Pawsey. This page illustrates its usage.

...

$ docker pull quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0 


Column


Note

This page is still a work in progress and support for Machine Learning workload has just started. Please check it frequently for updates.


...

Column
width900px


Code Block
languagebash
themeEmacs
titleListing 1. An example batch script to run a TensorFlow distributed training job.
#!/bin/bash

#SBATCH --account=pawsey12345-gpu
#SBATCH --partition=gpu
#SBATCH --exclusive
#SBATCH --nodes=2
#SBATCH --tasks-per-node=1

module load tensorflow/rocm5.6-tf2.12

VENV_PATH=/software/projects/pawsey12345/matilda/myenv/bin/activate
PYTHON_SCRIPT=/software/projects/pawsey12345/matilda/matilda-machinelearning/models/01_horovod_mnist.py
srun --tasks-per-node=1 --nodes=2 bash -c "source $VENV_PATH && python3 $PYTHON_SCRIPT"


...