Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt

Machine Learning workloads are supported on Setonix through a custom TensorFlow container developed by Pawsey. This page illustrates its usage.

...

$ docker pull quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0 


Column


Note

This page is still a work in progress and support for Machine Learning workload has just started. Please check it frequently for updates.


...

Column
width900px


Code Block
languagebash
themeEmacs
titleListing 1. distribute_tf.sh : An example batch script to run a TensorFlow distributed training job.
#!/bin/bash

#SBATCH --account=pawsey12345-gpu
#SBATCH --partition=gpu
#SBATCH --exclusive
#SBATCH --nodes=2

module load tensorflow/rocm5.6-tf2.12

#If additional python packages have been installed in user's own virtual environment
VENV_PATH=/software/projects/pawsey12345/matilda/myenv/bin/activate

#Clear definition of the python script containing the tensorflow training case
PYTHON_SCRIPT=/software/projects/pawsey12345/matilda/matilda-machinelearning/models/01_horovod_mnist.py

#Launch for execution indicating resources to the srun command
srun -N2N 2 -n16n 16 -c8c 8 --ntasks-per-node=8 --gres=gpu:8 bash -c "source $VENV_PATH && python3 $PYTHON_SCRIPT"


...