Creating a vGPU instance follows the same steps as a normal instance (see Create a Nimbus Instance for details), with the key difference being the choice of flavour and image.

vGPU Flavours

Firstly, if you wish to run a vGPU instance on your Nimbus project, you must first apply for GPU access. If you haven't already done so, refer to Nimbus GPU Access for further information.

If you already have GPU access, then one or both of the following vGPU flavours will be available under the "Flavor" section of instance creation:

Flavour	CPU Cores	RAM	vGPU
r1.V100-4C	7	45GB	4GB V100
r1.V100-8C	7	90GB	8GB V100

You need to select one of these two flavours in order to have your instance created on one of the Nimbus GPU nodes.

vGPU Images

While the vGPU flavour is required for your instance to access to the GPU hardware, the vGPU image is required to actually run anything with it. You can't just use a base Ubuntu image and install the CUDA drivers yourself, because you will require GRID drivers that are only available under NVIDIA license (which are built into the vGPU images). Similarly, using the vGPU images without the above vGPU instances will not get you GPU access.

Under the "Source" section of instance creation, select one of the following vGPU images (the names are fairly self explanatory):

Image ID	Image Name	Ubuntu version	CUDA version
1233c07e-b280-44dc-a247-238848a96e49	Pawsey - Ubuntu 20.04 - vGPU CUDA 10 - 2023-06	20.04 (focal)	10.2
cff70d35-9d6e-425c-a9ae-d319325cd106	Pawsey - Ubuntu 20.04 - vGPU CUDA 11 - 2023-06	20.04 (focal)	11.8

NVIDIA Container Toolkit

Please note that both of these images also have nvidia-docker installed (see https://github.com/NVIDIA/nvidia-docker for details). The idea is that if the specific version of CUDA is not ideal for your workflow, you can run a different version as a container on the same instance. You will still need to use the image with the same major CUDA version as what you plan to use (Ubuntu 18.04 for CUDA 10.x, or Ubuntu 20.04 for CUDA 11.x). The main reason for this is that CUDA 10.x is not supported under Ubuntu 20.04.

For examples of running NVIDIA application containers on your instance, see documentation here.

Testing that the GPU is accessible from docker

As an example, to run nvidia-smi to test your GPU access under a CUDA 10.0 container, run the following:

docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi

Using a directory from the instance in docker

You can also use the "-v" flag to mount a local directory as a volume within the container. As another example, to run the file deviceQuery in the local directory /home/ubuntu within the CUDA 10.0 container, run the following:

docker run --rm --gpus all -v /home/currently_logged_in_user:/home/ubuntu nvidia/cuda:10.0-base ~/deviceQuery

User permissions

As docker runs a user root in your instance any files you create in the container (for instance in the above example /home/ubuntu in docker will create files in the /home/currently_logged_in_user directory in the instance) will be owned by user root on the host. You can workaround this in two ways:

Change the ownership by using chmod for instance:
```
chmod currently_logged_in_user /home/currently_logged_in_user/created_file.csv
```
(for more details see the chmod manpage)
Use the "-u" flag in docker run (which depending on how the container was created may or may not work) for instance:
```
docker run -u $(id -u):$(id -g) ...
```

Example jupyter-lab

For a more complete example, let us assume you want to run jupyter-lab (see https://jupyterlab.readthedocs.io/en/stable/ ) in the container with tensorflow 2 on python 3. You find from the nvidia brebaked container list here that there is one suitable with tag 20.03-tf2-py3. That does not contain jupyter-lab, so we will need to add this.

As a prerequisite, in order to connect to jupyter, you will need to setup port forwarding using "ssh -L ..." (see the ssh mangpage) when connecting to your instance. Then you need to create a file called "Dockerfile" in your git repository (we recommend that you use version control, in order to be able to bring up the instance very fast after a shutdown with no loss of work), with the following contents:

FROM nvcr.io/nvidia/tensorflow:20.03-tf2-py3

RUN pip install jupyterlab

For more details on what you can add to a Dockerfile, read the reference at https://docs.docker.com/engine/reference/builder/

You can then build your new container using:

docker build -t my_playgound - < Dockerfile

and then run it with:

docker run -p 8888:8888 -ti -v $HOME/my_git_repo:/workspace/my_git_repo  --gpus all my_playground jupyter-lab

You can then connect to your ssh tunnel port as usual.

For more information on using the NVIDIA Container Toolkit, please refer to https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

User Support Documentation

Creating a Nimbus vGPU Instance