/
How to Run RStudio

How to Run RStudio

This page describes how to run RStudio in a container on Pawsey systems with Slurm.

This involves:

  • Downloading or building an RStudio image
  • Submitting a job to the queue that will launch an RStudio server
  • Creating an SSH tunnel and connecting to the RStudio server via a web browser

Getting the container images

There are a number of good resources for prebuilt Jupyter and RStudio Docker images:

  • Jupyter Docker Stacks provides prebuilt Jupyter images designed for Tensforflow, Spark, and data science workflows, which are available on DockerHub.
  • Rocker has prebuilt RStudio images available on DockerHub.

You can use these as base images to install additional packages if needed. Once you have your desired image built we can submit a batch script that launches the container. 

For this example we're going to be using the rocker/tidyverse (external site) Docker image. At the time of writing the latest available R version is 4.3.1. It provides an R installation, the RStudio server, the R devtools and the Tidyverse collection for data science. This image ships with a startup script that allows for a number of runtime options to be specified: see the USE menu in the Rocker homepage (external site).

Setting up the job script

To start, we need to set up the directories we want to work with. We recommend running this on a writable directory with significant space such as /scratch. We're going to create a directory where we'll start our RStudio container. You should put any relevant data in this directory. 

Set the working directory within RStudio

To read data from and write data to this directory, first manually change directory to this location from inside the RStudio session (see figure 2).

The following script launches an RStudio server on the compute node. The first step in the script creates a working directory before launching Rstudio.

Listing 1. Slurm script for running RStudio in a container
#!/bin/bash -l
# Allocate slurm resources, edit as necessary 
#SBATCH --account=$PAWSEY_PROJECT
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=58G
#SBATCH --time=01:00:00
#SBATCH --job-name=rstudio_server
#SBATCH --partition=work 
#SBATCH --export=NONE

# Set our working directory
# Should be in a writable path with some space, like /scratch
# You'll need to manually change dir to this one through the RStudio interface
# Session -> Set Working Directory -> Choose Directory -> ...
dir="${MYSCRATCH}/rstudio-dir"

# Set the image and tag we want to use
image="docker://rocker/tidyverse:4.3.1"

# Load Singularity. The version may change over time. 
module load singularity/4.1.0-slurm

# You should not need to edit the lines below

# Prepare the working directory
mkdir -p $dir
cd ${dir}

# Get the image filename
imagename=${image##*/}
imagename=${imagename/:/_}.sif

# Create a user-specific tmp directory to avoid clashes between users
tmp_dir="/tmp/tmp_$USER"
mkdir -p $tmp_dir

# Get the hostname of the Setonix node
# We'll set up an SSH tunnel to connect to the RStudio server
host=$(hostname)

# Set the port for the SSH tunnel
# This part of the script uses a loop to search for available ports on the node;
# this will allow multiple instances of GUI servers to be run from the same host node
port="8787"
pfound="0"
while [ $port -lt 65535 ] ; do
  check=$( ss -tuna | awk '{print $4}' | grep ":$port *" )
  if [ "$check" == "" ] ; then
    pfound="1"
    break
  fi
  : $((++port))
done
if [ $pfound -eq 0 ] ; then
  echo "No available communication port found to establish the SSH tunnel."
  echo "Try again later. Exiting."
  exit
fi

# Generate a random password for the session
export PASSWORD=$(openssl rand -base64 15)

# Pull our Docker image in a folder
singularity pull $imagename $image

echo "*****************************************************"
echo "Setup - from your laptop do:"
echo "ssh -N -f -L ${port}:${host}:${port} $USER@setonix.pawsey.org.au"
echo "*****"
echo "The launch directory is: $dir"
echo "*****"
echo "Secret for this session is: $PASSWORD"
echo "*****************************************************"
echo ""
echo "*****************************************************"
echo "Terminate - from your laptop do:"
echo "kill \$( ps x | grep 'ssh.*-L *${port}:${host}:${port}' | awk '{print \$1}' )"
echo "*****************************************************"
echo ""

# Launch our container
# Note that content of /home will be lost after runtime
# You'll need to manually change dir to the working dir through the RStudio interface
# Session -> Set Working Directory -> Choose Directory -> ... 
srun -N $SLURM_JOB_NUM_NODES -n $SLURM_NTASKS -c $SLURM_CPUS_PER_TASK \
  singularity exec -c \
  -B ${tmp_dir}:/tmp \
  -B ${dir}:$HOME \
  -B ${tmp_dir}:/var \
  ${imagename} \
  rserver --www-port ${port} --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper --server-user=$(whoami) 

We'll submit this script (saved as rstudio-on-singularity.slmto the queue, which will build the RStudio image we wish to use, launch a writable container using Singularity, and start the RStudio server with the options we need.

You may need to modify the script as needed (e.g. changing which image you use). It assumes the following:

  • Your [your-project-name] is replaced with your pawsey project name, e.g. pawsey####
  • You're using 1 core in the debug queue for 1 hour
  • Your work directory is $MYSCRATCH/rstudio-dir
  • It uses the rocker/tidyverse:4.3.1 Docker image
  • The Singularity module version is singularity/4.1.0-slurm

Run your RStudio server

To start, submit the SLURM jobscript. It will take a few minutes to start (depending on how busy the queue and how large of an image you're downloading). Once the job starts you will have a SLURM output file in your directory, which will have instructions on how to connect at the end. This file also provides the password needed to connect and the directory where data will be saved. 

Terminal 1. Launching RStudio and extracting connection information
$ sbatch rstudio-on-singularity.slm
Submitted batch job 3171917

$ cat slurm-31711917.out #Check the content of the output file for instructions
.
.
.
*****************************************************
Setup - from your laptop do: 
ssh -N -f -L 8787:nid001008:8787 matilda@setonix.pawsey.org.au
*****
The launch directory is: /scratch/pawsey0001/matilda/rstudio-dir
*****
Secret for this session is: r8KR5cjGCllsJn0bHn52
*****************************************************

*****************************************************
Terminate - from your laptop do:
kill $( ps x | grep 'ssh.*-L *8787:nid001008:8787' | awk '{print $1}' )
*****************************************************

After reading the instructions provided in the slurm-xxxx.out file, you need to open a separate terminal window in your own computer. In that terminal, execute the command that will perform the ssh tunneling between a port in the local computer and a port on the compute node where the RStudio container is running. In this case:

ssh -N -f -L 8787:nid001008:8787 matilda@setonix.pawsey.org.au

You can then open up a web browser and use the address http://localhost:8787  to access your RStudio server. Note that the port number might differ from "8787", always double check the port number inside the SSH command.

Figure 1. RStudio sign-in page


The username here is your Pawsey username. The password is the "secret" generated by the SLURM script, and is displayed in the SLURM outputfile so that you can copy-paste it from there:

Finally, manually change the working directory in your RStudio session through the menu Session -> Set Working Directory -> Choose Directory:

In the pop-up window, click on the More ( … ) button on the right of the home item:

Figure 2. Set working directory in RStudio


The working directory you chose for your session is displayed in the SLURM output file so that you can copy-paste it from there:

Press OK, then the Choose button, and you'll have a running RStudio server on Setonix.

The information above is for RStudio launched on setonix.pawsey.org.au. Ensure that you look at your output to select the correct machine.

Installing R packages during an RStudio session

If you need to install additional R packages from inside the RStudio session, first run this command once from the R console:

$ .libPaths("~")

If you regularly need additional packages in your RStudio session, consider building a customised RStudio container rather than installing them from within the session.

Clean up when you are finished

Once you have finished:

  • In the Pawsey cluster, cancel your job with scancel.
  • In your local computer, kill the SSH tunnel, based on the command displayed in the output file. In this case:

kill $( ps x | grep 'ssh.*-L *8787:nid001008:8787' | awk '{print $1}' )

External links