How to Run JupyterLab via Conda

This page describes how to run JupyterLab with Conda on Pawsey systems with Slurm. This involves launching JupyterLab and then connecting to the Jupyter server. Although containers are our preferred method of running JupyterLab, they do not facilitate custom software installation on the fly.

On this page:

Overview

The first step in using JupyterLab is to make it available on the supercomputer via Conda. Then, you will need to prepare a batch script to execute the JupyterLab server on a compute node. Finally, when you are finished, you will need to clean up the session.

Installing the software with Conda

The following packages are required to run JupyerLab via Conda. You can install them to a new or existing environment, along with other packages that you may need for your analysis. You can also do pip install  from your notebook later if needed. 

conda install -c conda-forge jupyterlab -y
conda install conda-forge::nodejs -y 
conda install conda-forge::jupyter-server-proxy -y

Setting up the batch script

The following script launches a Jupyter notebook on a compute node. The first step is to enter a writable directory with some space, such as /scratch, to launch the notebook. Create a directory where you will start our Jupyter notebook container and put any relevant data or Jupyter notebooks in this directory.

Listing 1. Slurm script for running JupyterHub in a GPU-enabled container
#!/bin/bash -l
# Allocate slurm resources, edit as necessary
#SBATCH --account=your_pawsey_account
#SBATCH --partition=work
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=8GB
#SBATCH --time=02:00:00
#SBATCH --nodes=1
#SBATCH --job-name=jupyter_notebook
#SBATCH --export=NONE
  
# Set our working directory
# Should be in a writable path with some space, like /scratch
dir="${MYSCRATCH}/jupyter"

# Set the name of the Conda enviroment we will use
conda_env="my_jupyter_env"

# Load dependencies
# Note that the specific version will change over time
module load python/3.10.10

# Activate Conda environment
# You may need to update this path if you did not install Conda in /software/projects/<your_project>/<user_name>/miniconda3
source ${MYSOFTWARE}/miniconda3/bin/activate ${conda_env}
 
# You should not need to edit the lines below
  
# Prepare the working directory
mkdir -p ${dir}
cd ${dir}
 
# Get the hostname
# We'll set up an SSH tunnel to connect to the Juypter notebook server
host=$(hostname)
  
# Set the port for the SSH tunnel
# This part of the script uses a loop to search for available ports on the node;
# this will allow multiple instances of GUI servers to be run from the same host node
port="8888"
pfound="0"
while [ $port -lt 65535 ] ; do
  check=$( ss -tuna | awk '{print $4}' | grep ":$port *" )
  if [ "$check" == "" ] ; then
    pfound="1"
    break
  fi
  : $((++port))
done
if [ $pfound -eq 0 ] ; then
  echo "No available communication port found to establish the SSH tunnel."
  echo "Try again later. Exiting."
  exit
fi
 
  
echo "*****************************************************"
echo "Setup - from your laptop do:"
echo "ssh -N -f -L ${port}:${host}:${port} $USER@$PAWSEY_CLUSTER.pawsey.org.au"
echo "*****"
echo "The launch directory is: $dir"
echo "*****************************************************"
echo ""
echo "*****************************************************"
echo "Terminate - from your laptop do:"
echo "kill \$( ps x | grep 'ssh.*-L *${port}:${host}:${port}' | awk '{print \$1}' )"
echo "*****************************************************"
echo ""
   
#Launch the notebook
srun -N $SLURM_JOB_NUM_NODES -n $SLURM_NTASKS -c $SLURM_CPUS_PER_TASK \
    jupyter lab \
  --no-browser \
  --port=${port} --ip=0.0.0.0 \
  --notebook-dir=${dir}

Run your Jupyter notebook server

To start, submit the SLURM jobscript. It will take a few minutes to start (depending on how busy the queue and how large of an image you're downloading). Once the job starts you will have a SLURM output file in your directory, which will have instructions on how to connect at the end. 

Terminal 1. Launching JupyterHub and extracting connection information
$ sbatch jupyter-notebook-conda.slm
Submitted batch job 3528476

$ cat slurm-3528476.out #Check for instructions in the slurm output file
.
.
.
*****************************************************
Setup - from your laptop do:
ssh -N -f -L 8888:nid001011:8888 <username>@setonix.pawsey.org.au
*****
The launch directory is: /scratch/pawsey0001/...
*****************************************************

*****************************************************
Terminate - from your laptop do:
kill $( ps x | grep 'ssh.*-L *8888:z123:8888' | awk '{print $1}' )
*****************************************************
.
.
.   
    To access the notebook, open this file in a browser:
        file:///home/matilda/.local/share/jupyter/runtime/nbserver-17-open.html
    Or copy and paste one of these URLs:
        http://z123:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242
     or http://127.0.0.1:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242

In your local computer, open a separate terminal window and execute SSH command to create the tunneling among your local port and a port in the compute node. In this case, the instruction provided by the output file is:

ssh -N -f -L 8888:nid001011:8888 <username>@setonix.pawsey.org.au

After this step, you can open up a web browser and use the address displayed in the output file to access your Jupyter notebook. In this example the address is:

http://127.0.0.1:8888/?token=3291a7b1e6ce7791f020df84a7ce3c4d2f3759b5aaaa4242

Alternatively, you could go to the web address http://27.0.0.1:8888 or http://localhost:8888, and then when prompted insert the token string that comes after "?token=" above. (Note that your port number might differ from "8888".)


Figure 1. Jupyter authentication page

Note:

The information above is a notebook launched on setonix.pawsey.org.au. Ensure that you look at your output to select the correct machine.

Clean up when you are finished

Once you have finished:

  • In the Pawsey cluster, cancel your job with scancel.
  • In your local computer, kill the SSH tunnel, based on the command displayed in the output file:

kill $( ps x | grep 'ssh.*-L *8888:nid001011:8888' | awk '{print $1}' )