Panel

title	This page:

Table of Contents

Summary

This page covers information on how to use the new 'Pawsey Bio - Ubuntu 2220.04 - 20232021-0311' image for Nimbus. Instructions on how to choose this image when creating your instance can be found here. This Bio-image is created to cater to bioinformatics users who prefer to have their instances pre-installed with software and tools commonly used in the bioinformatics domain, including over 8000 Biocontainer tools.

NoteIf you previously used the now deprecated 'Pawsey Bio - Ubuntu 20.04 - 2021-11' image, most of the instructions on here will still apply

. Some of the software are part of Pawsey's ongoing effort to improve the experience of bioinformatics users at Pawsey.

Before you begin

You may be required to input your SSH public key or the path to your SSH public key on your local machine while using some of these software. Please ensure you have it ready to go. Instructions for how to generate one can be found /wiki/spaces/SUP/pages/55410887.

Pre-installed software

The list of pre-installed software on this image is as follows:

SoftwareInformation

Ansible

An

- an automation platform that Pawsey uses to automate a number of software

deployment

deployment
CernVM-FS

A

- a read-only file system for accessing

files on shared repositories

reference datasets
Docker

A

- a popular container engine

Google Chrome

Jupyter Notebook (container)
Lmod

A

- a modules environment that we use at Pawsey for loading sotware
Nextflow

A

- a popular workflow manager
Pip

A

- a Python package installer
Python3
RStudio

To use RStudio interactively - see Run RStudio InteractivelySingularityA

(contrainer)
Singularity - a popular container engine that can be used on HPC
Singularity-HPC

A

- a container modules installer
Spack

A

- a package management tool

X2go

A virtual desktop application - see Setting up a virtual desktop for your instance

Instructions

On this page, we will only cover instructions for how to use CernVM-FS and , Jupyter Notebook, RStudio, Singularity-HPC. For instructions for other software listed above, please see the software's original documentation page.

CernVM-FS

CernVM-FS is a read-only file system that was developed by another supercomputing centre (Cern). It allows files such as container tools, reference datasets and other shared resources that are commonly used by many researchers to be accessed, added to, and updated in the one location. At Pawsey, we currently mirror the Biocontainer tools and reference genome datasets that are on Galaxy Project's repositories repository. Please note that these the datasets may not be comprehensive, and this service is not meant to replace your current methods for accessing public datasets.

The Biocontainer tools are in the format of Singularity containers. To use them, you can skip this step and proceed to Singularity-HPC.

To access and view the list of Biocontainer tools:

Note

Due to a recent (27th July 2022) change in the CernVM-FS proxy on Pawsey, please ensure to do the following before proceeding:

Code Block
sudo apt-get autoremove cvmfs sudo apt-get purge cvmfs sudo rm -rf /etc/cvmfs/ git clone https://github.com/PawseySC/Pawsey-CernVM-FS.git

Then run the following to set up the repositories from Galaxy and AARNet, respectively:

Code Block

cd  Pawsey-CernVM-FS

# for Galaxy repos
sudo ./cvmfs-client-setup.sh \
    --stratum-1 cvmfs1-mel0.gvl.org.au \
    --stratum-1 cvmfs1-ufr0.galaxyproject.eu \
    --stratum-1 cvmfs1-tacc0.galaxyproject.org \
    --stratum-1 cvmfs1-iu0.galaxyproject.org \
    --stratum-1 cvmfs1-psu0.galaxyproject.org \
    --proxy cvmfs-cachingproxy.pawsey.org.au \
    pubkeys/cvmfs-config.galaxyproject.org.pub \
    pubkeys/data.galaxyproject.org.pub \
    pubkeys/main.galaxyproject.org.pub \
    pubkeys/sandbox.galaxyproject.org.pub \
    pubkeys/singularity.galaxyproject.org.pub \
    pubkeys/test.galaxyproject.org.pub \
    pubkeys/usegalaxy.galaxyproject.org.pub

# for AARNet repos
sudo ./cvmfs-client-setup.sh \
    --stratum-1 bcws.test.aarnet.edu.au \
    --proxy cvmfs-cachingproxy.pawsey.org.au \
    pubkeys/containers.biocommons.aarnet.edu.au.pub pubkeys/data.biocommons.aarnet.edu.au.pub pubkeys/tools.biocommons.aarnet.edu.au.pub

You can then refer to and use the path to the datasets as follows:

Code Block

ls /cvmfs/data.galaxyproject.org
ls /cvmfs/singularity.galaxyproject.org
ls /cvmfs/main.galaxyproject.org
ls /cvmfs/cvmfs-config.galaxyproject.org

ls /cvmfs/containers.biocommons.aarnet.edu.au
ls /cvmfs/data.biocommons.aarnet.edu.au
ls /cvmfs/tools.bioommons.aarnet.edu.au

Note: It may take a minute or two to load the folders. When you have done it once, it will not take as long to show again.

Code Block
ls /cvmfs/singularity.galaxyproject.org

To access the data files:

Code Block
ls /cvmfs/data.galaxyproject.org

Note

If you run into any errors with accessing the file system, run the following to re-install it:

Code Blocksudo apt-get autoremove cvmfs sudo apt-get purge cvmfs sudo rm -rf /etc/cvmfs/

Jupyter Notebook

Jupyter Notebooks are very popular way of running bioinformatics analysis due to its interactive nature. We have enabled an automated way of creating such notebooks from a container format. As containers do not store files, all notebooks created from the interactive session are stored on your Nimbus instance under /data.

Note

title	Open port 8888 on the Nimbus dashboard

From the Nimbus dashboard:

1.	Navigate to Network > Security Groups:	Image Added
2.	Click on + Create Security Group, name it 'port 8888' and then select the Create Security Group button:	Image Added
3.	Select + Add Rule:	Image Added
4.	Then enter the port number 8888 under 'Port', and click on the Add button:	Image Added
5.	Navigate back to Compute > Instances, then click on the arrow down button for the your instance, and select Edit Security Groups. Ensure that you select the port 8888 security group that you have just created, i.e. it should appear on the right hand side list of Instance Security Groups:	Image Added

Then, to start a Jupyter Notebook, simply run the following:

Code Block
ansible-playbook /jupyter-on-nimbus/ansible-jupyternotebook.yaml

Notes:

The playbook will prompt you to choose a version of the Jupyter Datascience Notebook (https://hub.docker.com/r/jupyter/datascience-notebook/tags/)
The pulling of the container will take at least 3-5 minutes, once pulled, it will run instantly each time you want to use it
From time to time, you may want to re-clone the jupyter-on-nimbus repo for any future updates. Only essential updates will be notified to Nimbus users.
Code Block
git clone https://github.com/PawseySC/jupyter-on-nimbus sudo rm -rf /jupyter-on-nimbus sudo mv jupyter-on-nimbus /

RStudio

RStudio is another popular bioinformatics analysis interactive software. Here we have also enabled automation to starting an RStudio server session. As containers do not store files, all R sessions created from the interactive session are stored on your Nimbus instance under /data.

Note

title	Open port 8787 on Nimbus dashboard

From the Nimbus dashboard:

1.	Navigate to Network > Security Groups:	Image Added
2.	Click on + Create Security Group, name it 'port 8787' and then select the Create Security Group button:	Image Added
3.	Select + Add Rule:	Image Added
4.	Then enter the port number 8787 under 'Port', and click on the Add button:	Image Added
5.	Navigate back to Compute > Instances, then click on the arrow down button for the your instance, and select Edit Security Groups. Ensure that you select the port 8787 security group that you have just created, i.e. it should appear on the right hand side list of Instance Security Groups:	Image Added

Then, to start an RStudio server session, simply run the following:

Code Block
ansible-playbook /rstudio-on-nimbus/ansible-rstudio.yaml -i /rstudio-on-nimbus/vars_list

Notes:

The playbook will prompt you to choose a version of R (https://hub.docker.com/r/rocker/tidyverse/tags - note that only 4.1.0 are supported at present)
You can also enter any R libraries or BiocManager tools you require - ensure to follow the prompts accurately
The pulling of the container will take at least 3-5 minutes, once pulled, it will run instantly each time you want to use it
From time to time, you may want to re-clone the rstudio-on-nimbus repo for any future updates. Only essential updates will be notified to Nimbus users.
Code Block
git clone https://github.com/PawseySC/

Pawsey-CernVM-FS.git cd Pawsey-CernVM-FS sudo ./install-cvmfs.sh install

rstudio-on-nimbus
sudo -rf rm /rstudio-on-nimbus
sudo mv rstudio-on-nimbus /

Singularity-HPC

Singularity-HPC (SHPC) is a software for container modules. we have integrated the use of SHPC seamlessly with CernVM-FS. This means that you can easily access and use over 8000 Biocontainers without needing to understand container syntax.If you are familiar with using containers, this is an added bonus to your experience in using containers. If you are not, this is a great way to start using containers. As container syntax can be messy and confusing, being able to use them as modules removes the need for using container syntaxes. Singularity-HPC was created by one of the original developers of Singularity, and the registry includes many bioinformatics containers that can be readily pulled and used.

Before you begin, ensure to move your containers folder to your storage volume (.i.e. /data), then update the container base path:

Code Block
mv /home/ubuntu/containers /data/containers shpc config set container_base:/data/containers

To see the entire list of containers available on the registry, run the following command:

Code Block
shpc list

At Pawsey, we recommend using S-HPC's registry of quay.io/biocontainers containers, as Biocontainers are a reliable source of well-built containers, with versions that are seamlessly matched to BioConda's tools. To narrow down the list for a particular tool e.g, fastqcto biocontainers, run:

Code Block
shpc show -f quay.io/biocontainers/fastqc

To find

To install any of these containers, run:

Code Block
shpc install quay.io/biocontainers/TOOL_NAME

To use installed containers, run the following and use the tool as you normally would (no container syntax required):

Code Block
$ module avail fastqc $ module load quay.io/biocontainers/fastqc $ fastqc --version

To check the list of modules loaded:

Code Block
$ module list/TOOL_NAME

Notes:

The list in the registry is not exhaustive, more packages are being added each day by the community
Pawsey is working to add to the quay.io/biocontainers list successively

You may want to reclone the repo so that your list is always updated, as such:

Code Block
git clone https://github.com/singularityhub/singularity-hpc sudo mv /singularity-hpc /shpc

Child pages (Children Display)

page	Cloud Documentation

Version	Old Version 21	New Version 22
Changes made by	Audrey Stott	Audrey Stott
Saved on	Mar 03, 2023	Mar 03, 2023

Versions Compared

Key

Summary

Before you begin

Pre-installed software

Instructions

CernVM-FS

Jupyter Notebook

RStudio

Singularity-HPC

Content Comparison

Versions Compared

Key

Summary

Before you begin

Pre-installed software

Instructions

CernVM-FS

Jupyter Notebook

RStudio

Singularity-HPC