Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Panel | ||
---|---|---|
| ||
|
Summary
This page covers information on how to use the new 'Pawsey Bio - Ubuntu 20.04 - 2021-11' image for Nimbus. Instructions on how to choose this image when creating your instance can be found here. This Bio-image is created to cater to bioinformatics users who prefer to have their instances pre-installed with software commonly used in the bioinformatics domain. Some of the software are part of Pawsey's ongoing effort to improve the experience of bioinformatics users at Pawsey.
Before you begin
You may be required to input your SSH public key or the path to your SSH public key on your local machine while using some of these software. Please ensure you have it ready to go. Instructions for how to generate one can be found /wiki/spaces/SUP/pages/55410887.
Pre-installed software
The list of pre-installed software on this image is as follows:
- Ansible - an automation platform that Pawsey uses to automate a number of software deployment
- CernVM-FS - a read-only file system for accessing reference datasets
- Docker - a popular container engine
- Jupyter Notebook (container)
- Lmod - a modules environment that we use at Pawsey for loading sotware
- Nextflow - a popular workflow manager
- Pip - a Python package installer
- Python3
- RStudio (contrainer)
- Singularity - a popular container engine that can be used on HPC
- Singularity-HPC - a container modules installer
- Spack - a package management tool
Instructions
On this page, we will only cover instructions for how to use CernVM-FS, Jupyter Notebook, RStudio, Singularity-HPC. For instructions for other software listed above, please see the software's original documentation page.
CernVM-FS
CernVM-FS is a read-only file system that was developed by another supercomputing centre (Cern). It allows files such as reference datasets that are commonly used by many researchers to be accessed, added to, and updated in the one location. At Pawsey, we currently mirror the datasets that are on Galaxy Project's repository. Please note that the datasets may not be comprehensive, and this service is not meant to replace your current methods for accessing public datasets.
Note | ||||
---|---|---|---|---|
Due to a recent (27th July 2022) change in the CernVM-FS proxy on Pawsey, please ensure to do the following before proceeding:
Then run the following to set up the repositories from Galaxy and AARNet, respectively:
|
You can then refer to and use the path to the datasets as follows:
Code Block |
---|
ls /cvmfs/data.galaxyproject.org ls /cvmfs/singularity.galaxyproject.org ls /cvmfs/main.galaxyproject.org ls /cvmfs/cvmfs-config.galaxyproject.org ls /cvmfs/containers.biocommons.aarnet.edu.au ls /cvmfs/data.biocommons.aarnet.edu.au ls /cvmfs/tools.bioommons.aarnet.edu.au |
Note: It may take a minute or two to load the folders. When you have done it once, it will not take as long to show again.
Jupyter Notebook
Jupyter Notebooks are very popular way of running bioinformatics analysis due to its interactive nature. We have enabled an automated way of creating such notebooks from a container format. As containers do not store files, all notebooks created from the interactive session are stored on your Nimbus instance under /data. To start a Jupyter Notebook, simply run the following:
Code Block |
---|
ansible-playbook /jupyter-on-nimbus/ansible-jupyternotebook.yaml |
Notes:
- The playbook will prompt you to choose a version of the Jupyter Datascience Notebook (https://hub.docker.com/r/jupyter/datascience-notebook/tags/)
- The pulling of the container will take at least 3-5 minutes, once pulled, it will run instantly each time you want to use it
From time to time, you may want to re-clone the jupyter-on-nimbus repo for any future updates. Only essential updates will be notified to Nimbus users.
Code Block git clone https://github.com/PawseySC/jupyter-on-nimbus sudo rm -rf /jupyter-on-nimbus sudo mv jupyter-on-nimbus /
RStudio
RStudio is another popular bioinformatics analysis interactive software. Here we have also enabled automation to starting an RStudio server session. As containers do not store files, all R sessions created from the interactive session are stored on your Nimbus instance under /data. To start an RStudio server session, simply run the following:
Code Block |
---|
ansible-playbook /rstudio-on-nimbus/ansible-rstudio.yaml -i /rstudio-on-nimbus/vars_list |
Notes:
- The playbook will prompt you to choose a version of R (https://hub.docker.com/r/rocker/tidyverse/tags - note that only 3.6.3 and >= 4.01.0 are supported hereat present)
- You can also enter any R libraries or BiocManager tools you require - ensure to follow the prompts accurately
- The pulling of the container will take at least 3-5 minutes, once pulled, it will run instantly each time you want to use it
From time to time, you may want to re-clone the rstudio-on-nimbus repo for any future updates. Only essential updates will be notified to Nimbus users.
Code Block git clone https://github.com/PawseySC/rstudio-on-nimbus sudo -rf rm /rstudio-on-nimbus sudo mv rstudio-on-nimbus /
Singularity-HPC
Singularity-HPC is a software for container modules. If you are familiar with using containers, this is an added bonus to your experience in using containers. If you are not, this is a great way to start using containers. As container syntax can be messy and confusing, being able to use them as modules removes the need for using container syntaxes. Singularity-HPC was created by one of the original developers of Singularity, and the registry includes many bioinformatics containers that can be readily pulled and used.
Before you begin, ensure to move your containers folder to your storage volume (.i.e. /data), then update the container base path:
Code Block |
---|
mv /home/ubuntu/containers /data/containers shpc config set container_base:/data/containers |
To see the entire list of containers available on the registry, run the following command:
Code Block |
---|
shpc list |
At Pawsey, we recommend using S-HPC's registry of quay.io/biocontainers containers, as Biocontainers are a reliable source of well-built containers, with versions that are seamlessly matched to BioConda's tools. To narrow down the list to biocontainers, run:
Code Block |
---|
shpc show -f quay.io/biocontainers |
To install any of these containers, run:
Code Block |
---|
shpc install quay.io/biocontainers/TOOL_NAME |
To use installed containers, run the following and use the tool as you normally would (no container syntax required):
Code Block |
---|
module load quay.io/biocontainers/TOOL_NAME |
Notes:
- The list in the registry is not exhaustive, more packages are being added each day by the community
- Pawsey is working to add to the quay.io/biocontainers list successively
You may want to reclone the repo so that your list is always updated, as such:
Code Block git clone https://github.com/singularityhub/singularity-hpc sudo mv /singularity-hpc /shpc
Child pages (Children Display) | ||
---|---|---|
|