Excerpt |
---|
This page describes how to run JupyterHub in a container on Pawsey systems with Slurm. This involves launching JupyterHub and then connecting to the Jupyter server. |
...
For this example, we're going to be using the jupyter/datascience-notebook (external site) Docker image. It provides a Conda environment with a large collection of common Python packages (including NumPy, SciPy, Pandas, Scikit-learn, Bokeh and Matplotlib), an R environment (with the tidyverse (external site) packages), and a Julia environment. All of these are accessible via a Jupyter notebook server.
This Docker image ships with a startup script that allows for a number of runtime options to be specified. Most of these are specific to running a container using Docker; we will focus on how to run this container using Singularity.
The datascience-notebook
image has a default user, jovyan
, and it assumes that you will be able to write to /home/jovyan
. When you run a Docker container via Singularity, you will be running as your Pawsey username inside the container, so we won't be able to write to /home/jovyan
. Instead, we can mount a specific directory (on Pawsey's filesystems) into the container at /home/jovyan
. This will allow our Jupyter server to do things like save notebooks and write checkpoint files, and those will persist on Pawsey's filesystem after the container has stopped.
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 1. Slurm script for running JupyterHub in a GPU-enabled container |
---|
collapse | true |
---|
| #!/bin/bash -l
# Allocate slurm resources, edit as necessary
#SBATCH --account=[your-project-name]
# Here we request the appropriate GPU partition on a system
#SBATCH --partition=work
# Since jupyterhub is not mpi enabled, we just use one task
#SBATCH --ntasks=1
#SBATCH --time=02:00:00
#SBATCH --job-name=jupyter_notebook
#SBATCH --export=NONE
# Set our working directory
# This is the directory we'll mount to /home/jovyan in the container
# Should be in a writable path with some space, like /scratch
dir="${MYSCRATCH}/jupyter-dir"
# Set the image and tag we want to use
image="docker://jupyter/datascience-notebook:latest"
# You should not need to edit the lines below
# Prepare the working directory
mkdir -p ${dir}
cd ${dir}
# Get the image filename
imagename=${image##*/}
imagename=${imagename/:/_}.sif
# Get the hostname of the Zeus
node
# We'll set up an SSH tunnel to connect to the Juypter notebook server
host=$(hostname)
# Set the port for the SSH tunnel
# This part of the script uses a loop to search for available ports on the node;
# this will allow multiple instances of GUI servers to be run from the same host node
port="8888"
pfound="0"
while [ $port -lt 65535 ] ; do
check=$( netstat -tuna | awk '{print $4}' | grep ":$port *" )
if [ "$check" == "" ] ; then
pfound="1"
break
fi
: $((++port))
done
if [ $pfound -eq 0 ] ; then
echo "No available communication port found to establish the SSH tunnel."
echo "Try again later. Exiting."
exit
fi
# Load Singularity
module load singularity/3.8.6
# Pull our image in a folder
singularity pull $imagename $image
echo "*****************************************************"
echo "Setup - from your laptop do:"
echo "ssh -N -f -L ${port}:${host}:${port} $USER@$PAWSEY_CLUSTER.pawsey.org.au"
echo "*****"
echo "The launch directory is: $dir"
echo "*****************************************************"
echo ""
echo "*****************************************************"
echo "Terminate - from your laptop do:"
echo "kill \$( ps x | grep 'ssh.*-L *${port}:${host}:${port}' | awk '{print \$1}' )"
echo "*****************************************************"
echo ""
# Launch our container
# and mount our working directory to /home/jovyan in the container
# and bind the run time directory to our home directory
singularity exec -C \
-B ${dir}:/home/joyvan \
-B ${dir}:${HOME} \
${imagename} \
jupyter notebook \
--no-browser \
--port=${port} --ip=0.0.0.0 \
--notebook-dir=${dir} |
|
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | DJango |
---|
title | Terminal 1. Launching JupyterHub and extracting connection information |
---|
collapse | true |
---|
| $ sbatch jupyter-notebook-one-singularity.slm
Submitted batch job 3528476
$ cat slurm-3528476.out
.
.
.
Writing manifest to image destination
Storing signatures
[34mINFO: [0m Creating SIF file...
[34mINFO: [0m Build complete: /scratch/pawsey0002/astott/jupyter-dir/singularity-cache/cache/oci-tmp/18ef2702c6a25bd26b81e7b6dc831adb2bc294ae7bc9b011150b8f4573c41d4a/datascience-notebook_latest.sif
*****************************************************
Setup - from your laptop do:
ssh -N -f -L 8888:z123:8888 astott@zeus<user>@setonix.pawsey.org.au
*****
The launch directory is: /scratch/pawsey0002/astott/jupyter-dirpawsey0001/...
*****************************************************
*****************************************************
Terminate - from your laptop do:
kill $( ps x | grep 'ssh.*-L *8888:z123:8888' | awk '{print $1}' )
*****************************************************
[I 04:38:34.503 NotebookApp] Writing notebook server cookie secret to /home/astott/.local/share/jupyter/runtime/notebook_cookie_secret
[I 04:38:36.677 NotebookApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
[I 04:38:36.677 NotebookApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 04:38:37.605 NotebookApp] Serving notebooks from local directory: /group/pawsey0002/astott/jupyter-dir
[I 04:38:37.605 NotebookApp] The Jupyter Notebook is running at:
[I 04:38:37.605 NotebookApp] http://z123:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242
[I 04:38:37.605 NotebookApp] or http://127.0.0.1:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242
[I 04:38:37.605 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 04:38:37.616 NotebookApp]
To access the notebook, open this file in a browser:
file:///home/astott/.local/share/jupyter/runtime/nbserver-17-open.html
Or copy and paste one of these URLs:
http://z123:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242
or http://127.0.0.1:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242 |
|
...
ssh -N -f -L 8888:z123:8888 astott@zeus<username>@setonix.pawsey.org.au
After this step, you can open up a web browser and use the address displayed in the output file to access your Jupyter notebook. In this example the address is:
...
Alternatively, you could go to the web address http://27.0.0.1:8888 or http://localhost:8888, and then when prompted insert the token string that comes after "?token=" above. (Note that your port number might differ from "8888".)
Column |
---|
Image RemovedImage Added Figure 1. Jupyter authentication page Note |
---|
| The information above is a notebook launched on zeus.pawsey.org.au . Ensure that you look at your output to select the correct machine. |
|
...
Running a GPU-enabled container on GPU Pawsey systems (such as Topaz and Phase-2 Setonix) with Slurm is very similar to running a standard Jupyter notebook. The main differences are:
...
Column |
---|
|
Code Block |
---|
language | bash |
---|
theme | Emacs |
---|
title | Listing 2. Slurm script for running JupyterHub in a GPU-enabled container |
---|
collapse | true |
---|
| #!/bin/bash -l
# Allocate slurm resources, edit as necessary
#SBATCH --account=[your-project-name]
# Here we request the appropriate GPU partition on a system
#SBATCH --partition=gpuq
# Be aware that the request for GPU resources may change in later versions of slurm
#SBATCH --gres=gpu:1
# Since jupyterhub is not mpi enabled, we just use one task
#SBATCH --ntasks=1
#SBATCH --time=02:00:00
#SBATCH --job-name=jupyter_notebook
#SBATCH --export=NONE
# Set our working directory
# This is the directory we'll mount to /home/jovyan in the container
# Should be in a writable path with some space, like /scratch
dir="${MYSCRATCH}/jupyter-dir"
# Set the image and tag we want to use
image="docker://jupyter/datascience-notebook:latest"
# You should not need to edit the lines below
# Prepare the working directory
mkdir -p ${dir}
cd ${dir}
# Get the image filename
imagename=${image##*/}
imagename=${imagename/:/_}.sif
# Get the hostname of the Zeus node
# We'll set up an SSH tunnel to connect to the Juypter notebook server
host=$(hostname)
# Set the port for the SSH tunnel
# This part of the script uses a loop to search for available ports on the node;
# this will allow multiple instances of GUI servers to be run from the same host node
port="8888"
pfound="0"
while [ $port -lt 65535 ] ; do
check=$( netstat -tuna | awk '{print $4}' | grep ":$port *" )
if [ "$check" == "" ] ; then
pfound="1"
break
fi
: $((++port))
done
if [ $pfound -eq 0 ] ; then
echo "No available communication port found to establish the SSH tunnel."
echo "Try again later. Exiting."
exit
fi
# Load Singularity
module load singularity/3.8.6
# Load CUDA and set environment variable for Singularity
module load cuda
export SINGULARITYENV_CUDA_HOME=$CUDA_HOME
# Pull our image in a folder
singularity pull $imagename $image
echo "*****************************************************"
echo "Setup - from your laptop do:"
echo "ssh -N -f -L ${port}:${host}:${port} $USER@$PAWSEY_CLUSTER.pawsey.org.au"
echo "*****"
echo "The launch directory is: $dir"
echo "*****************************************************"
echo ""
echo "*****************************************************"
echo "Terminate - from your laptop do:"
echo "kill \$( ps x | grep 'ssh.*-L *${port}:${host}:${port}' | awk '{print \$1}' )"
echo "*****************************************************"
echo ""
# Launch our container
# and mount our working directory to /home/jovyan in the container
# and bind the run time directory to our home directory
singularity exec --nv -C \
-B ${dir}:/home/joyvan \
-B ${dir}:$HOME \
${imagename} \
jupyter notebook \
--no-browser \
--port=${port} --ip=0.0.0.0 \
--notebook-dir=${dir} |
|
...
Column |
---|
|
Code Block |
---|
language | py |
---|
theme | Emacs |
---|
title | Listing 3. Simple GPU-enabled Python code snippet |
---|
collapse | true |
---|
| # key GPU library
from numba import cuda
import numpy as np
# define some kernels
@cuda.jit
def add_kernel(x, y, out):
idx = cuda.grid(1)
out[idx] = x[idx] + y[idx]
n = 4096
x = np.arange(n).astype(np.int32) # [0...4095] on the host
y = np.ones_like(x) # [1...1] on the host
out = np.zeros_like(x)
# cuda commands to copy memory to the device
d_x = cuda.to_device(x)
d_y = cuda.to_device(y)
d_out = cuda.to_device(out)
# run kernel
threads_per_block = 128
blocks_per_grid = 32
add_kernel[blocks_per_grid, threads_per_block](d_x, d_y, d_out)
cuda.synchronize()
# output result
print(d_out.copy_to_host()) # Should be [1...4096] |
|
External links
- DockerHub
- For information about runtime options supported by the startup script in the Jupyter image, see Common Features in the Jupyter Docker Stacks documentation
- The Rocker Project ("Docker Containers for the R Environment")