Creating a Nimbus GPU Instance
Creating a GPU instance follows the same steps as a normal instance (see Create a Nimbus Instance for details), with the key difference being the choice of flavour and image.
GPU Flavour
Firstly, if you wish to run a GPU instance on your Nimbus project, you must first apply for GPU access. If you haven't already done so, refer to Nimbus GPU Access for further information.
If you already have GPU access, then one GPU flavour will be available under the "Flavor" section of instance creation:
Flavour | CPU Cores | RAM | GPU |
---|---|---|---|
m3.v100 | 7 | 91GB | NVIDIA Tesla V100 |
GPU Images
While the vGPU flavour is required for your instance to access to the GPU hardware, the vGPU image is required to actually run anything with it. You can't just use a base Ubuntu image and install the CUDA drivers yourself, because you will require GRID drivers that are only available under NVIDIA license (which are built into the vGPU images). Similarly, using the vGPU images without the above vGPU instances will not get you GPU access.
Under the "Source" section of instance creation, select one of the following vGPU images (the names are fairly self explanatory):
Image ID | Image Name | Ubuntu version | CUDA version |
---|---|---|---|
315ef5ac-e92f-47d0-955f-2ac6dc1bbf82 | Pawsey GPU CUDA 11 - Ubuntu 20.04 - 2024-05 | 20.04 (focal) | 11 |
40521cc4-5655-4460-b4c7-827892ce2883 | Pawsey GPU CUDA 12 - Ubuntu 22.04 - 2024-05 | 22.04 (jammy) | 12.4 |
NVIDIA Container Toolkit
Please note that both of these images also have nvidia-docker installed (see https://github.com/NVIDIA/nvidia-docker for details). The idea is that if the specific version of CUDA is not ideal for your workflow, you can run a different version as a container on the same instance. You will still need to use the image with the same major CUDA version as what you plan to use (Ubuntu 18.04 for CUDA 10.x, or Ubuntu 20.04 for CUDA 11.x). The main reason for this is that CUDA 10.x is not supported under Ubuntu 20.04.
For examples of running NVIDIA application containers on your instance, see documentation here.
Testing that the GPU is accessible from docker
As an example, to run nvidia-smi to test your GPU access under a CUDA 10.0 container, run the following:
docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi
Using a directory from the instance in docker
You can also use the "-v" flag to mount a local directory as a volume within the container. As another example, to run the file deviceQuery in the local directory /home/ubuntu within the CUDA 10.0 container, run the following:
docker run --rm --gpus all -v /home/currently_logged_in_user:/home/ubuntu nvidia/cuda:10.0-base ~/deviceQuery
User permissions
As docker runs a user root in your instance any files you create in the container (for instance in the above example /home/ubuntu in docker will create files in the /home/currently_logged_in_user directory in the instance) will be owned by user root on the host. You can workaround this in two ways:
Change the ownership by using chmod for instance:
chmod currently_logged_in_user /home/currently_logged_in_user/created_file.csv
(for more details see the chmod manpage)
Use the "-u" flag in docker run (which depending on how the container was created may or may not work) for instance:
docker run -u $(id -u):$(id -g) ...
Example jupyter-lab
For a more complete example, let us assume you want to run jupyter-lab (see https://jupyterlab.readthedocs.io/en/stable/ ) in the container with tensorflow 2 on python 3. You find from the nvidia prebaked container list here that there is one suitable with tag 20.03-tf2-py3. That does not contain jupyter-lab, so we will need to add this.
As a prerequisite, in order to connect to jupyter, you will need to setup port forwarding using "ssh -L ..." (see the ssh mangpage) when connecting to your instance. Then you need to create a file called "Dockerfile" in your git repository (we recommend that you use version control, in order to be able to bring up the instance very fast after a shutdown with no loss of work), with the following contents:
FROM nvcr.io/nvidia/tensorflow:20.03-tf2-py3 RUN pip install jupyterlab
For more details on what you can add to a Dockerfile, read the reference at https://docs.docker.com/engine/reference/builder/
You can then build your new container using:
docker build -t my_playgound - < Dockerfile
and then run it with:
docker run -p 8888:8888 -ti -v $HOME/my_git_repo:/workspace/my_git_repo --gpus all my_playground jupyter-lab
You can then connect to your ssh tunnel port as usual.
For more information on using the NVIDIA Container Toolkit, please refer to https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html