Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Column
width900


Warning
titleTensorflow module has known issues

After June 2024 maintenance, the moduleĀ tensorflow/rocm5.6-tf2.12 has shown some problems. For temporary fix check: June 2024 Software Update - Important Information


...

Column
width900px


Code Block
languagebash
themeDJango
titleTerminal 3. Running a ML Python script interactively on a compute node
$ salloc -p gpu --nodes=1 --gres=gpu:1 -A yourProjectName-gpu --time=00:20:00
salloc: Granted job allocation 4360927
salloc: Waiting for resource configuration
salloc: Nodes nid002828 are ready for job

$ module load tensorflow/rocm5.6-tf2.12

$ module list
Currently Loaded Modules:
1) craype-x86-milan 7) pawsey 13) cray-libsci/23.09.1.1
2) libfabric/1.15.2.0 8) pawseytools 14) PrgEnv-gnu/8.4.0
3) craype-network-ofi 9) gcc/12.2.0 15) singularity/4.1.0-mpi
4) perftools-base/23.09.0 10) craype/2.7.23 16) tensorflow/rocm5.6-tf2.12
5) xpmem/2.8.4-1.0_7.3__ga37cbd9.shasta 11) cray-dsmml/0.2.2
6) pawseyenv/2024.05 12) cray-mpich/8.1.27

$ python3 01_horovod_mnist.py 
2023-09-07 14:32:18.907641: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:root:This is process with rank 0 and local rank 0
INFO:root:This is process with rank 0 and local rank 0: gpus available are: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2023-09-07 14:32:23.886297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 134200961 MB memory:  -> device: 0, name: AMD Instinct MI250X, pci bus id: 0000:d1:00.0.0
[...]Epoch 1/401875/1875 [==============================] - 7s 1ms/step - loss: 0.3005 - accuracy: 0.9133
Epoch 2/40
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1417 - accuracy: 0.9575
Epoch 3/40
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1066 - accuracy: 0.9681
[...]
Epoch 39/40
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0191 - accuracy: 0.9935
Epoch 40/40
1875/1875 [==============================] - 2s 1ms/step - loss: 0.0186 - accuracy: 0.9938
[...]
INFO:root:This is process with rank 0 and local rank 0: my prediction is [[ -3.5764134  -6.1231604  -1.5476028   2.1744065 -14.56255    -5.4938045
  -20.374353   12.388017   -3.1701622  -1.0773858]]



...

To do so, you will need to open a BASH shell within the container. Thanks to the installation of the TensorFlow container as a module, there is no need to explicitly call theĀ singularity command. Instead, the containerised installation provides the bash wrapper that does all the work for the users and then provide an interactive bash session inside the Singularity container. Here is a practical example that installs xarray package into a virtual environment named myenv (it is key to use the --system-site-packages option to preserve the installation of python and packages present in the container):

Column
width900px


Code Block
languagebash
themeDJango
titleTerminal 4. Installing additional Python packages using virtual environments
$ module load tensorflow/rocm5.6-tf2.12 
$ mkdir -p $MYSOFTWARE/manual/software/pythonEnvironments/tensorflowContainer-environments
$ cd $MYSOFTWARE/manual/software/pythonEnvironments/tensorflowContainer-environments
$ bash

Singularity> python3 -m venv --system-site-packages myenv  
Singularity> source myenv/bin/activate

(myenv) Singularity> python3 -m pip install xarray
Collecting xarray
  Using cached xarray-2023.8.0-py3-none-any.whl (1.0 MB)
Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.10/dist-packages (from xarray) (23.1)
Collecting pandas>=1.4
  Using cached pandas-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
Requirement already satisfied: numpy>=1.21 in /usr/local/lib/python3.10/dist-packages (from xarray) (1.23.5)
Collecting pytz>=2020.1
  Using cached pytz-2023.3.post1-py2.py3-none-any.whl (502 kB)
Collecting python-dateutil>=2.8.2
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting tzdata>=2022.1
  Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas>=1.4->xarray) (1.16.0)
Installing collected packages: pytz, tzdata, python-dateutil, pandas, xarray
Successfully installed pandas-2.1.0 python-dateutil-2.8.2 pytz-2023.3.post1 tzdata-2023.3 xarray-2023.8.0

# Now test the use of the installed package
(myenv) Singularity> python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
2023-09-07 14:59:00.339696: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> import xarray
>>> exit()

(myenv) Singularity> exit

$ ls -l
drwxr-sr-x 5 matilda pawsey12345 4096 Apr 22 16:33 myenv


...

FROM quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0

To pull the image to your local desktop with Docker you can use:

$ docker pull quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0

To know more about our recommendations of container builds with Docker and later translation into Singularity format for their use in Setonix please refer to the Containers Documentation.

...

Column
width900px


Code Block
languagebash
themeEmacs
titleListing 1. runTensorflow.sh : An example batch script to run a TensorFlow distributed training job.
#!/bin/bash --login
#SBATCH --job-name=distributed_tf
#SBATCH --partition=gpu
#SBATCH --nodes=2              #2 nodes in this example 
#SBATCH --exclusive            #All resources of the node are exclusive to this job
#                              #8 GPUs per node (16 "allocation packs" in total for the job)
#SBATCH --time=00:05:00
#SBATCH --account=pawsey12345-gpu #IMPORTANT: use your own project and the -gpu suffix

#----
#Loading needed modules:
export ROCM_PATH=/opt/rocm #Workaround for path errors with new CPE. Will be removed after container fix.
module load tensorflow/<version>
echo -e "\n\n#------------------------#"
module list

#----
#Printing the status of the given allocation
echo -e "\n\n#------------------------#"
echo "Printing from scontrol:"
scontrol show job ${SLURM_JOBID}

#----
#If additional python packages have been installed in user's own virtual environment
VENV_PATH=$MYSOFTWARE/manual/software/pythonEnvironments/tensorflowContainer-environments/myenv

#----
#Clear definition of the python script containing the tensorflow training case
PYTHON_SCRIPT=$MYSRATCH/matilda-machinelearning/models/01_horovod_mnist.py

#----
#TensorFlow settings if needed:
#  The following two variables control the real number of threads in Tensorflow code:
export TF_NUM_INTEROP_THREADS=1    #Number of threads for independent operations
export TF_NUM_INTRAOP_THREADS=1    #Number of threads within individual operations 

#----
#Execution
#Note: srun needs the explicit indication full parameters for use of resources in the job step.
#      These are independent from the allocation parameters (which are not inherited by srun)
#      Each task needs access to all the 8 available GPUs in the node where it's running.
#      So, no optimal binding can be provided by the scheduler.
#      Therefore, "--gpus-per-task" and "--gpu-bind" are not used.
#      Optimal use of resources is now responsability of the code.
#      "-c 8" is used to force allocation of 1 task per CPU chiplet. Then, the REAL number of threads
#         for the code SHOULD be defined by the environment variables above.
echo -e "\n\n#------------------------#"
echo "Code execution:"
#When using a virtual environment:
srun -N 2 -n 16 -c 8 --gres=gpu:8 bash -c "source $VENV_PATH/bin/activate &&  python3 $PYTHON_SCRIPT"
#When no virtual environement is needed:
#srun -N 2 -n 16 -c 8 --gres=gpu:8 python3 $PYTHON_SCRIPT

#----
#Printing information of finished job steps:
echo -e "\n\n#------------------------#"
echo "Printing information of finished jobs steps using sacct:"
sacct -j ${SLURM_JOBID} -o jobid%20,Start%20,elapsed%20

#----
#Done
echo -e "\n\n#------------------------#"
echo "Done"


...