/
How to Run JupyterLab via Container

How to Run JupyterLab via Container

This page describes how to run JupyterLab in a container on Pawsey systems with Slurm. This involves launching JupyterLab and then connecting to the Jupyter server.

On this page:

Overview

The first step in using JupyterLab is to make it available on the supercomputer. Given the possibly long list of dependencies, it is better to use a container rather than installing it in the traditional way. The first section shows how you can get the container image of JupyterLab. Then, you will need to prepare a batch script to execute the JupyterLab server on a compute node. Finally, when you are finished, you will need to clean up the session.

Getting the container images

There are a number of good resources for prebuilt Jupyter and RStudio Docker images:

  • Jupyter Docker Stacks (external site) provides prebuilt Jupyter images designed for Tensforflow, Spark, and data science workflows, which are available on DockerHub.
  • Rocker has prebuilt RStudio images available on DockerHub.

You can use these as base images to install additional packages if needed. Once you have your desired image built we can submit a batch script that launches the container.

For this example, we're going to be using the jupyter/datascience-notebook (external site) Docker image. It provides a Conda environment with a large collection of common Python packages (including NumPy, SciPy, Pandas, Scikit-learn, Bokeh and Matplotlib), an R environment (with the tidyverse (external site) packages), and a Julia environment. All of these are accessible via a Jupyter notebook server.

This Docker image ships with a startup script that allows for a number of runtime options to be specified. Most of these are specific to running a container using Docker; we will focus on how to run this container using  Singularity.

The datascience-notebook image has a default user, jovyan, and it assumes that you will be able to write to /home/jovyan. When you run a Docker container via Singularity, you will be running as your Pawsey username inside the container, so we won't be able to write to /home/jovyan. Instead, we can mount a specific directory (on Pawsey's filesystems) into the container at /home/jovyan. This will allow our Jupyter server to do things like save notebooks and write checkpoint files, and those will persist on Pawsey's filesystem after the container has stopped.

Setting up the batch script

The following script launches a Jupyter notebook on a compute node. The first step is to enter a writable directory with some space, such as /scratch, to launch the notebook. Create a directory where you will start our Jupyter notebook container and put any relevant data or Jupyter notebooks in this directory. This is also the directory that will be mounted to /home/jovyan.

Listing 1. Slurm script for running JupyterHub in a container
#!/bin/bash -l
# Allocate slurm resources, edit as necessary
#SBATCH --account=[your-project-name]
# Here we request the appropriate partition for the system
#SBATCH --partition=work
# Since jupyterlab is not mpi enabled, we just use one task 
#SBATCH --ntasks=1
#SBATCH --mem=20GB
#SBATCH --time=02:00:00
#SBATCH --job-name=jupyter_notebook
#SBATCH --export=NONE

# Set our working directory
# This is the directory we'll mount to /home/jovyan in the container
# Should be in a writable path with some space, like /scratch
jupyterDir="${MYSCRATCH}/jupyter-dir"

# Set the image and tag we want to use
image="docker://jupyter/datascience-notebook:latest"

# You should not need to edit the lines below

# Prepare the working directory
mkdir -p ${jupyterDir}
cd ${jupyterDir}

# Get the image filename
imagename=${image##*/}
imagename=${imagename/:/_}.sif

# Get the hostname 
# We'll set up an SSH tunnel to connect to the Juypter notebook server
host=$(hostname)

# Set the port for the SSH tunnel
# This part of the script uses a loop to search for available ports on the node;
# this will allow multiple instances of GUI servers to be run from the same host node
port="8888"
pfound="0"
while [ $port -lt 65535 ] ; do
  check=$( ss -tuna | awk '{print $4}' | grep ":$port *" )
  if [ "$check" == "" ] ; then
    pfound="1"
    break
  fi
  : $((++port))
done
if [ $pfound -eq 0 ] ; then
  echo "No available communication port found to establish the SSH tunnel."
  echo "Try again later. Exiting."
  exit
else
  echo "Port to use is port=${port}"
fi

# Load Singularity
module load singularity/4.1.0-nompi
:0
# Pull our image in a folder
singularity pull $imagename $image

echo "*****************************************************"
echo "Setup - from your laptop do:"
echo "ssh -N -f -L ${port}:${host}:${port} $USER@$PAWSEY_CLUSTER.pawsey.org.au"
echo "*****"
echo "The launch directory is: $jupyterDir"
echo "*****************************************************"
echo ""
echo "*****************************************************"
echo "Terminate - from your laptop do:"
echo "kill \$( ps x | grep 'ssh.*-L *${port}:${host}:${port}' | awk '{print \$1}' )"
echo "*****************************************************"
echo ""
 
# Launch our container
# and mount our working directory to /home/jovyan in the container
# and bind the run time directory to our home directory
singularity exec -C \
  -B ${jupyterDir}:/home/joyvan \
  -B ${jupyterDir}:${HOME} \
  ${imagename} \
  jupyter notebook \
  --no-browser \
  --port=${port} --ip=0.0.0.0 \
  --notebook-dir=${jupyterDir}

Run your Jupyter notebook server

To start, submit the SLURM jobscript. It will take a few minutes to start (depending on how busy the queue and how large of an image you're downloading). Once the job starts you will have a SLURM output file in your directory, which will have instructions on how to connect at the end. 

Terminal 1. Launching JupyterHub and extracting connection information
$ sbatch jupyter-notebook-one-singularity.slm
Submitted batch job 3528476
$ cat slurm-3528476.out
.
.
.
Writing manifest to image destination
Storing signatures
[34mINFO:   [0m Creating SIF file...
[34mINFO:   [0m Build complete: /scratch/pawsey12345/matilda/jupyter-dir/singularity-cache/cache/oci-tmp/18ef2702c6a25bd26b81e7b6dc831adb2bc294ae7bc9b011150b8f4573c41d4a/datascience-notebook_latest.sif

*****************************************************
Setup - from your laptop do:
ssh -N -f -L 8888:nid001007:8888 <user>@setonix.pawsey.org.au
*****
The launch directory is: /scratch/pawsey0001/...
*****************************************************

*****************************************************
Terminate - from your laptop do:
kill $( ps x | grep 'ssh.*-L *8888:nid001007:8888' | awk '{print $1}' )
*****************************************************

[I 04:38:34.503 NotebookApp] Writing notebook server cookie secret to /home/matilda/.local/share/jupyter/runtime/notebook_cookie_secret
[I 04:38:36.677 NotebookApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
[I 04:38:36.677 NotebookApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 04:38:37.605 NotebookApp] Serving notebooks from local directory: /group/pawsey12345/matilda/jupyter-dir
[I 04:38:37.605 NotebookApp] The Jupyter Notebook is running at:
[I 04:38:37.605 NotebookApp] http://nid001007:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242
[I 04:38:37.605 NotebookApp]  or http://127.0.0.1:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242
[I 04:38:37.605 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 04:38:37.616 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///home/matilda/.local/share/jupyter/runtime/nbserver-17-open.html
    Or copy and paste one of these URLs:
        http://nid001007:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242
     or http://127.0.0.1:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242

In a separate local terminal window, run SSH based on the command listed in the output file. Look for the section that says "Setup - from your laptop do" and, from your laptop, type the command. In this example this would be:

ssh -N -f -L 8888:nid001007:8888 <username>@setonix.pawsey.org.au 

After this step, you can open up a web browser and copy-paste the address displayed in the output file to access your Jupyter notebook. Look for the section that says "To access the notebook, ...", and copy-paste the address in your web browser. In this example the address to copy-paste is:

http://127.0.0.1:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242

Alternatively, you could go to the web address http://27.0.0.1:8888 or http://localhost:8888, and then when prompted insert the token string that comes after "?token=" above. (Note that your port number might differ from "8888".)


Figure 1. Jupyter authentication page

Note:

The information above is a notebook launched on setonix.pawsey.org.au. Ensure that you look at your output to select the correct machine.

Also note that the available version of the singularity module in the system may have changed, so that you need to adapt the script accordingly.

From the Jupyter notebook menu, you can create a new notebook and start from there. If you already count with a notebook that you want to execute/develop, you may need to copy it into the jupiter-dir first.

Clean up when you are finished

Once you have finished (and saved and exited from the Jupyter notebook instance):

  • From the Pawsey cluster, cancel your job with scancel.
  • From your own computer, kill the SSH tunnel, based on the command displayed in the output file. Look for the section that says "Terminate - from your laptop do" and execute the command from your laptop. In this example, you should type:

kill $( ps x | grep 'ssh.*-L *8888:nid001007:8888' | awk '{print $1}' )

External links

  • DockerHub
  • For information about runtime options supported by the startup script in the Jupyter image, see Common Features in the Jupyter Docker Stacks documentation
  • The Rocker Project ("Docker Containers for the R Environment")