...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | DJango |
---|
title | Terminal 1. Look for the TensorFlow module |
---|
| $ module avail tensorflow
--------------------------------------------------------- /software/setonix/2023.08/containers/views/modules -------------------------------------------------------------
tensorflow/rocm5.5-tf2.11-dev tensorflow/rocm5.6-tf2.12 (D)
|
|
The tensorflow/rocm5.5-tf2.11-dev
module is deprecated and should not be used. It will be removed at the next maintenance.The container is deployed using SHPC (Singularity Registry HPC). SHPC generates a module (listed above) providing convenient aliases for key programs within the container. This means you can run the container's python3
executable without explicitly loading and executing Singularity. Here is a very simple example.
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | DJango |
---|
title | Terminal 2. A simple interaction with the TensorFlow module. |
---|
| $ module load tensorflow/rocm5.6-tf2.12
$ python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-09-07 14:29:15.551224: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> tf.__version__
'2.12.0'
>>> exit()
$ |
|
Here is another example of running a simple training script on a GPU node:
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | DJango |
---|
title | Terminal 3. Running a ML Python script interactively on a compute node |
---|
| $ salloc -p gpu --nodes=1 --gres=gpu:gpus-per-node=1 --ntasksgpus-per-nodetask=1 -Apawsey0001A yourProjectName-gpu --time=00:20:00
salloc: Granted job allocation 4360927
salloc: Waiting for resource configuration
salloc: Nodes nid002828 are ready for job
$ module load tensorflow/rocm5.6-tf2.12
$ python3 01_horovod_mnist.py
2023-09-07 14:32:18.907641: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:root:This is process with rank 0 and local rank 0
INFO:root:This is process with rank 0 and local rank 0: gpus available are: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2023-09-07 14:32:23.886297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 134200961 MB memory: -> device: 0, name: AMD Instinct MI250X, pci bus id: 0000:d1:00.0
[...]
INFO:root:This is process with rank 0 and local rank 0: my prediction is [[ -3.5764134 -6.1231604 -1.5476028 2.1744065 -14.56255 -5.4938045
-20.374353 12.388017 -3.1701622 -1.0773858]]
|
|
(Note that when requesting the interactive allocation, users should use their correct project name instead of the "yourProjectName
" place holder. Also notice the use of the "-gpu
" postfix to the project name in order to be able to access any partition with GPU-nodes.)
Installing additional Python packages
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | DJango |
---|
title | Terminal 4. Installing additional Python packages using virtual environments |
---|
| $ module load tensorflow/rocm5.6-tf2.12
$ bash
Singularity> python3 -m venv --system-site-packages myenv
Singularity> source myenv/bin/activate
(myenv) Singularity> python3 -m pip install xarray
Collecting xarray
Using cached xarray-2023.8.0-py3-none-any.whl (1.0 MB)
Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.10/dist-packages (from xarray) (23.1)
Collecting pandas>=1.4
Using cached pandas-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
Requirement already satisfied: numpy>=1.21 in /usr/local/lib/python3.10/dist-packages (from xarray) (1.23.5)
Collecting pytz>=2020.1
Using cached pytz-2023.3.post1-py2.py3-none-any.whl (502 kB)
Collecting python-dateutil>=2.8.2
Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting tzdata>=2022.1
Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas>=1.4->xarray) (1.16.0)
Installing collected packages: pytz, tzdata, python-dateutil, pandas, xarray
Successfully installed pandas-2.1.0 python-dateutil-2.8.2 pytz-2023.3.post1 tzdata-2023.3 xarray-2023.8.0
(myenv) Singularity> python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
2023-09-07 14:59:00.339696: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> import xarray
>>> exit
>>>
(myenv) Singularity>
exit
/software/projects/pawsey0001pawsey12345/cdipietrantonio>matilda> ls
myenv |
|
as you can see, the environment stays on the filesystem and can be used in later runs.
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 1. An example batch script to run a TensorFlow distributed training job. |
---|
| #!/bin/bash
#SBATCH --account=pawsey0001pawsey12345-gpu
#SBATCH --partition=gpu
#SBATCH --exclusive
#SBATCH --nodes=2
#SBATCH --tasks-per-node=1
module load tensorflow/rocm5.6-tf2.12
VENV_PATH=/software/projects/pawsey0001pawsey12345/cdipietrantoniomatilda/myenv/bin/activate
PYTHON_SCRIPT=/software/projects/pawsey0001pawsey12345/cdipietrantoniomatilda/cdipietrantoniomatilda-machinelearning/models/01_horovod_mnist.py
srun --tasks-per-node=1 --nodes=2 bash -c "source $VENV_PATH && python3 $PYTHON_SCRIPT"
|
|
...