Column | ||
---|---|---|
|
...
Here is another example of running a simple training script on a GPU node during an interactive session:
Column | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
|
(Note that when requesting the interactive allocation, users should use their correct project name instead of the "yourProjectName
" place holder used in the example. Also notice the use of the "-gpu
" postfix to the project name in order to be able to access any partition with GPU-nodes. Also note that the resource request for GPU nodes is different from the usual Slurm allocation requests. Please refer to the page Example Slurm Batch Scripts for Setonix on GPU Compute Nodes for a detailed explanation of resource allocation on GPU nodes.)
Installing additional Python packages
...
Note |
---|
If you think there is a Python package that must be included in the container as widely used in the Machine Learning community, you can submit a ticket to the Help Desk and we will evaluate your request. |
You can use a virtual environment to install additional Python packages you require and the container lacks. There are at least two ways on which users can install additional Python packages that are required and the container lacks. The first way is to build a user's own container image FROM
Pawsey Tensorflow container. The second way is the use of a virtual environment saved on Setonix itself, on which the user can install this additional Python packages and be loaded from there. This second way is our recommended procedure.
The trick is to create a virtual environment using the Python installation within the container. This ensures that your Packages are installed considering what is already installed on the container and not on Setonix. However, the virtual environment will be created on the host filesystem, ideally Setonix's /software
. Filesystems of Setonix are mounted by default on containers, are writable from within the container, and hence pip
can install additional packages. Additionally, virtual environments can be preserved from one container run to the next.
...
Column | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
|
Column | ||
---|---|---|
|
Distributed training
You can run TensorFlow on multiple Setonix nodes. The best way is to submit a job to Slurm using sbatch
and a batch script. Let's assume you prepared a Python script implementing your TensorFlow program, and you also have created a virtual environment you want to be active while executing the script. Here is a batch script that implements such a scenario.
...
FROM quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0
To pull the image to your local desktop with Docker you can use:
$ docker pull quay.io/pawsey/tensorflow:2.12.1.570-rocm5.6.0
To know more about our recommendations of container builds with Docker and later translation into Singularity format for their use in Setonix please refer to the Containers Documentation.
...
- How to Interact with Multiple GPUs using TensorFlow
- How to Interact with Multiple GPUs using TensorFlow
- Example Slurm Batch Scripts for Setonix on GPU Compute Nodes
- Containers
- SHPC (Singularity Registry HPC)