Excerpt |
---|
Machine Learning workloads are supported on Setonix through a custom TensorFlow container developed by Pawsey. This page illustrates its usage. |
...
$ docker pull quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0
Column |
---|
Note |
---|
This page is still a work in progress and support for Machine Learning workload has just started. Please check it frequently for updates. |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 1. distribute_tf.sh : An example batch script to run a TensorFlow distributed training job. |
---|
| #!/bin/bash
#SBATCH --account=pawsey12345-gpu
#SBATCH --partition=gpu
#SBATCH --exclusive
#SBATCH --nodes=2
module load tensorflow/rocm5.6-tf2.12
#If additional python packages have been installed in user's own virtual environment
VENV_PATH=/software/projects/pawsey12345/matilda/myenv/bin/activate
#Clear definition of the python script containing the tensorflow training case
PYTHON_SCRIPT=/software/projects/pawsey12345/matilda/matilda-machinelearning/models/01_horovod_mnist.py
#Launch for execution indicating resources to the srun command
srun -N 2 -n 16 -c 8 --ntasks-per-node=8 --gres=gpu:8 bash -c "source $VENV_PATH && python3 $PYTHON_SCRIPT"
|
|
...