Column

Note
This page is still a work in progress and support for Machine Learning workload has just started. Please check it frequently for updates.

...

Column

width	900px

Code Block

language	bash
theme	DJango
title	Terminal 3. Running a ML Python script interactively on a compute node

$ salloc -p gpu --nodes=1 --gres=gpu:1 -A yourProjectName-gpu --time=00:20:00
salloc: Granted job allocation 4360927
salloc: Waiting for resource configuration
salloc: Nodes nid002828 are ready for job

$ module load tensorflow/rocm5.6-tf2.12  

$ python3 01_horovod_mnist.py 
2023-09-07 14:32:18.907641: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:root:This is process with rank 0 and local rank 0
INFO:root:This is process with rank 0 and local rank 0: gpus available are: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2023-09-07 14:32:23.886297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 134200961 MB memory:  -> device: 0, name: AMD Instinct MI250X, pci bus id: 0000:d1:00.0
[...]
INFO:root:This is process with rank 0 and local rank 0: my prediction is [[ -3.5764134  -6.1231604  -1.5476028   2.1744065 -14.56255    -5.4938045
  -20.374353   12.388017   -3.1701622  -1.0773858]]

(Note that when requesting the interactive allocation, users should use their correct project name instead of the "yourProjectName" place holder used in the example. Also notice the use of the "-gpu" postfix to the project name in order to be able to access any partition with GPU-nodes. Also note that the resource request for GPU nodes is different from the usual Slurm allocation requests and also the parameters to be given to the srun command. Please refer to the page Example Slurm Batch Scripts for Setonix on GPU Compute Nodes for a detailed explanation of resource allocation on GPU nodes.)

Installing additional Python packages

...

FROM quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0

To pull the image to your local desktop with Docker you can use:

$ docker pull quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0

To know more about our recommendations of container builds with Docker and later translation into Singularity format for their use in Setonix please refer to the Containers Documentation.

...

Version	Old Version 34	New Version 35
Changes made by	Cristian Di Pietrantonio	Cristian Di Pietrantonio
Saved on	May 30, 2024	May 30, 2024

Versions Compared

Key

Installing additional Python packages

Content Comparison

Versions Compared

Key

Installing additional Python packages