...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | DJango |
---|
title | Terminal 3. Running a ML Python script interactively on a compute node |
---|
| $ salloc -p gpu --nodes=1 --gres=gpu:1 -A yourProjectName-gpu --time=00:20:00
salloc: Granted job allocation 4360927
salloc: Waiting for resource configuration
salloc: Nodes nid002828 are ready for job
$ module load tensorflow/rocm5.6-tf2.12
$ module list
Currently Loaded Modules:
1) craype-x86-milan 7) pawsey 13) cray-libsci/23.09.1.1
2) libfabric/1.15.2.0 8) pawseytools 14) PrgEnv-gnu/8.4.0
3) craype-network-ofi 9) gcc/12.2.0 15) singularity/4.1.0-mpi
4) perftools-base/23.09.0 10) craype/2.7.23 16) tensorflow/rocm5.6-tf2.12
5) xpmem/2.8.4-1.0_7.3__ga37cbd9.shasta 11) cray-dsmml/0.2.2
6) pawseyenv/2024.05 12) cray-mpich/8.1.27
$ python3 01_horovod_mnist.py
2023-09-07 14:32:18.907641: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:root:This is process with rank 0 and local rank 0
INFO:root:This is process with rank 0 and local rank 0: gpus available are: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2023-09-07 14:32:23.886297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 134200961 MB memory: -> device: 0, name: AMD Instinct MI250X, pci bus id: 0000:d1:00.0
[...]
Epoch 1/40187540
1875/1875 [==============================] - 7s 1ms/step - loss: 0.3005 - accuracy: 0.9133
Epoch 2/40
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1417 - accuracy: 0.9575
Epoch 3/40
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1066 - accuracy: 0.9681
[...]
Epoch 39/40
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0191 - accuracy: 0.9935
Epoch 40/40
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0186 - accuracy: 0.9938
[...]
INFO:root:This is process with rank 0 and local rank 0: my prediction is [[ -3.5764134 -6.1231604 -1.5476028 2.1744065 -14.56255 -5.4938045
-20.374353 12.388017 -3.1701622 -1.0773858]]
|
|
...
FROM quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0
To pull the image to your local desktop with Docker you can use:
$ docker pull quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0
To know more about our recommendations of container builds with Docker and later translation into Singularity format for their use in Setonix please refer to the Containers Documentation.
...