Excerpt |
Machine Learning workloads are supported on Setonix through a custom TensorFlow container developed by Pawsey. This page illustrates its usage. |
$ docker pull quay.io/pawsey/tensorflow:
Column |
Note |
This page is still a work in progress and support for Machine Learning workload has just started. Please check it frequently for updates. |
Column |
Code Block |
language | bash |
theme | Emacs |
title | Listing 1. distribute_tf.sh : An example batch script to run a TensorFlow distributed training job. |
| #!/bin/bash
#SBATCH --account=pawsey12345-gpu
#SBATCH --partition=gpu
#SBATCH --exclusive
#SBATCH --nodes=2
module load tensorflow/rocm5.6-tf2.12
#If additional python packages have been installed in user's own virtual environment
#Clear definition of the python script containing the tensorflow training case
#Launch for execution indicating resources to the srun command
srun -N 2 -n 16 -c 8 --ntasks-per-node=8 --gres=gpu:8 bash -c "source $VENV_PATH && python3 $PYTHON_SCRIPT"