...
title | Under construction |
---|
...
Insert excerpt TensorFlow TensorFlow name WarningAfterJune2024 nopanel true
In this tutorial, you are going to see how to write a Horovod-powered distributed TensorFlow computation. More specifically, the final goal is to train different models in parallel by assigning each of them to a different GPU. The discussion is organised in two sections. The first section illustrates Horovod's basic concepts and its usage coupled with TensorFlow, the second one uses the MNIST classification task as test case.
...
Column | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||
|
...
Column | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||
|
The first two lines are convenient function calls to retrieve the rank and local rank of the process, which are then logged for demonstration purposes. Next, each process retrieves the list of GPUs that are available on the node it is running on. Of course, processes on the same node will retrieve the same list, whereas any two processes running on different nodes will have different, non overlapping, sets of GPUs. In the latter case, resource contention is structurally impossible; it is in the former case that the local rank concept comes handy. Each process uses its local rank as index to select a GPU in the gpus
list and will not share it with any other processes because:
...
The last function call sets the GPU Tensorflow will use for each process. Try the script using distributed_tf.sh
.
To test what we have written so far, use the batch job script runTensorflow.sh
provided in the previous page as a template for submitting the job. You will need to adapt the batch job script and remove the exclusive
option to change the number of GPUs per node to 2 in the request of resources together with changes in the srun
command, and use of the python script (01_horovod_mnist.py
) containing the two parts described above. The adapted lines of the batch job script should look like:
#SBATCH --nodes=2 #2 nodes in this example
#SBATCH --gres=gpu:2 #2 GPUS per node
.
.
PYTHON_SCRIPT=$PYTHON_SCRIPT_DIR/01_horovod_mnist.py
.
.
srun -N 2 -n 4 -c 8 --gres=gpu:2 python3 $PYTHON_SCRIPT
(Note that the resource request for GPU nodes is different from the usual Slurm allocation requests and also the parameters to be given to the srun
command. Please refer to the page Example Slurm Batch Scripts for Setonix on GPU Compute Nodes for a detailed explanation of resource allocation on GPU nodes.)
You should see an output similar to the following one.:
Column | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||
|
...
Column | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||
|
...
First, the dataset is loaded from tensorFlow
module (being a standard dataset for test cases, Tensorflow provides a convenient function to retrieve it) and then split in two parts, one for training and the other for testing. What follows is the definition of the model and the loss function. Until now, every process executes the same code. They diverge when the model.fit
function is called. Indeed, the training dataset is implicitly partitioned using the size of the computation and rank of a process. Each process gets a different portion of samples because the rank is unique among all processes. Therefore, each trained model is different from one another. To prove this, each model is evaluated using the same test set through the model.evaluate
call. If you run the Python program adding this last part you should see that the accuracy reported from every task is slightly different. You can use the rank and size values in if
statements to train completely different models and, in general, make each process follow a different execution path.
Related pages
- How to Set Up the TensorFlow EnvironmentRunning TensorFlow on Setonix
- How to Use Horovod for Distributed Training in Parallel using TensorFlow
- Example Slurm Batch Scripts for Setonix on GPU Compute Nodes