Excerpt
Machine Learning workloads are supported on Setonix through a custom TensorFlow container developed by Pawsey. This page illustrates its usage.

$ docker pull quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0

Column

Note
This page is still a work in progress and support for Machine Learning workload has just started. Please check it frequently for updates.

...

The TensorFlow module

Currently, there are two TensorFlow containers is available on Setonix via modules that make use of containers:

Column

width	900px

Code Block

language	bash
theme	DJango
title	Terminal 1. Look for the TensorFlow module

$ module avail tensorflow
--------------------------------------------------------- /software/setonix/2023.08/containers/views/modules -------------------------------------------------------------
tensorflow/rocm5.6-tf2.12 (D)

The container is deployed as a module using SHPC (Singularity Registry HPC). SHPC generates a module (listed above) providing convenient aliases for key programs within the container. This means you can run the container's s python3 executable without explicitly loading and executing Singularity command. Singularity module is indeed loaded as a dependency when the Tensorflow module is loaded, and all the SIngularity commands are taken care of via wrappers. Here is a very simple example.

...

Column

width	900px

Code Block

language	bash
theme	DJango
title	Terminal 3. Running a ML Python script interactively on a compute node

$ salloc -p gpu --nodes=1 --gpus-per-node=1 --gpus-per-task=gres=gpu:1 -A yourProjectName-gpu --time=00:20:00
salloc: Granted job allocation 4360927
salloc: Waiting for resource configuration
salloc: Nodes nid002828 are ready for job
$ module load tensorflow/rocm5.6-tf2.12  
$ python3 01_horovod_mnist.py 
2023-09-07 14:32:18.907641: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:root:This is process with rank 0 and local rank 0
INFO:root:This is process with rank 0 and local rank 0: gpus available are: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2023-09-07 14:32:23.886297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 134200961 MB memory:  -> device: 0, name: AMD Instinct MI250X, pci bus id: 0000:d1:00.0
[...]
INFO:root:This is process with rank 0 and local rank 0: my prediction is [[ -3.5764134  -6.1231604  -1.5476028   2.1744065 -14.56255    -5.4938045
  -20.374353   12.388017   -3.1701622  -1.0773858]]

...

Version	Old Version 8	New Version 9
Changes made by	Cristian Di Pietrantonio	Alexis Espinosa
Saved on	Jan 10, 2024	Jan 30, 2024

Versions Compared

Key

The TensorFlow module

Page Comparison

Versions Compared

Key

The TensorFlow module