Page Comparison

...

Column

width	900px

Code Block

language	py
theme	Emacs
title	Listing 1. 01_horovod.mnist.py : Initialisation .(first part of the script)
linenumbers	true

import tensorflow as tf
import horovod.tensorflow as hvd
import logging
 
# Show our log messages
logging.basicConfig(level=logging.INFO)

# ...but disable tesorflow's ones except for errors
logging.getLogger("tensorflow").setLevel(logging.ERROR)

# initialize horovod - this call must always be done at the beginning of our scripts.
hvd.init()

...

Column

width	900px

Code Block

language	py
theme	Emacs
title	Listing 2. 01_horovod_mnist.py : Assigning a different GPU to each process .(second part of the script)
linenumbers	true

# retrieve and print the process's global and local ranks
rank = hvd.rank()
local_rank = hvd.local_rank()
size = hvd.size()
local_size = hvd.local_size()
logging.info(f"This is process with rank {rank} and local rank {local_rank}")

# each process retrieves the list of gpus available on its node
gpus = tf.config.experimental.list_physical_devices('GPU')
if local_rank == 0:
    logging.info(f"This is process with rank {rank} and local rank {local_rank}: gpus available are: {gpus}")
 
# each process selects a gpu (if any gpu is available)
if local_rank >= len(gpus):
    raise Exception("Not enough gpus.")
tf.config.experimental.set_visible_devices(gpus[local_rank], 'GPU')
 
# From now on each process has its own gpu to use...

...

The last function call sets the GPU Tensorflow will use for each process.

Try using as a template, the batch job script using runTensorflow.sh provided in the previous page (. You will need to adapt it for the number of GPUs per node and to use of this python script) the python script porposed here: 01_horovod_mnist.py. The adapted lines should look like:

#SBATCH --nodes=2 #2 nodes in this example
#SBATCH --gres=gpu:2 #2 GPUS per node
.
.
PYTHON_SCRIPT=$PYTHON_SCRIPT_DIR/01_horovod_mnist.py
.
.
srun -N 2 -n 4 -c 8 --gres=gpu:2 python3 $PYTHON_SCRIPT

(Note that the resource request for GPU nodes is different from the usual Slurm allocation requests and also the parameters to be given to the srun command. Please refer to the page Example Slurm Batch Scripts for Setonix on GPU Compute Nodes for a detailed explanation of resource allocation on GPU nodes.)
You should see an output similar to the following one.:

Column

width	900px

Code Block

language	bash
theme	Emacs
title	Listing 3. Example job output.
linenumbers	true

INFO:root:This is process with rank 1 and local rank 1
INFO:root:This is process with rank 0 and local rank 0
INFO:root:This is process with rank 23 and local rank 01
INFO:root:This is process with rank 32 and local rank 10
INFO:root:This is process with rank 20 and local rank 0: gpus available are: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
INFO:root:This is process with rank 02 and local rank 0: gpus available are: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

...

Column

width	900px

Code Block

language	py
theme	Emacs
title	Listing 4. 01_horovod_mnist.py : MNIST classification example (third part of the script)
linenumbers	true

# From now on each process has its own gpu to use.
# We will now train the same model on each gpu indipendently, and make each of them
# output a prediction for a different input.
mnist = tf.keras.datasets.mnist
 
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
 
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])
 
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
 
# We will partition the training set evenly among processes so that the same model
# is trained by each process on different data.
dataset_size = len(x_train)
from math import ceil
# samples per model - number of samples to train each model with
spm = ceil(dataset_size / size)
 
model.fit(x_train[rank*spm:(rank+1)*spm], y_train[rank*spm:(rank+1)*spm], epochs=15)
print(model.evaluate(x_test,  y_test, verbose=2))

...

Versions Compared

Old Version 7

New Version 8

Key