- Created by Audrey Stott, last modified on Mar 13, 2023
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 25 Next »
Summary
This page covers information on how to use the new 'Pawsey Bio - Ubuntu 20.04 - 2021-11' image for Nimbus. Instructions on how to choose this image when creating your instance can be found here. This Bio-image is created to cater to bioinformatics users who prefer to have their instances pre-installed with software commonly used in the bioinformatics domain. Some of the software are part of Pawsey's ongoing effort to improve the experience of bioinformatics users at Pawsey.
Before you begin
You may be required to input your SSH public key or the path to your SSH public key on your local machine while using some of these software. Please ensure you have it ready to go. Instructions for how to generate one can be found /wiki/spaces/SUP/pages/55410887.
Commonly used software
Software | Information |
---|---|
Ansible | An automation platform that Pawsey uses to automate a number of software deployment |
CernVM-FS | A read-only file system for accessing files on shared repositories |
Docker | A popular container engine |
Jupyter Notebook | For using an interactive Jupyter Notebook - see Run Jupyter Notebook Interactively |
Lmod | A modules environment that we use at Pawsey for loading sotware |
Nextflow | A popular workflow manager |
Pip | A Python package installer |
Python3 | A popular programming language used in many Bioinformatics software |
RStudio | For using R interactively - see Run RStudio Interactively |
Singularity | A popular container engine that can be used on HPC |
Singularity-HPC | A container modules installer |
Spack | A package management tool |
X2go | A virtual desktop application - see Setting up a virtual desktop for your instance |
Instructions
On this page, we will cover instructions for how to use CernVM-FS and Singularity-HPC. For instructions for other software listed above, please see the links provided above, or the software's original documentation page.
CernVM-FS
CernVM-FS is a read-only file system that was developed by another supercomputing centre (Cern). It allows files such as container tools, reference datasets and other shared resources that are commonly used by many researchers to be accessed, added to, and updated in the one location. At Pawsey, we currently cache the Biocontainer tools and reference genome datasets that are on Galaxy Project's repositories. Please note that the data sets may not be comprehensive, and this service is not meant to replace your current methods for accessing public datasets.
If you are using the 'Pawsey Bio - Ubuntu 20.04 - 2021-11' image, run the following to re-install CVMFS:
sudo apt-get autoremove cvmfs sudo apt-get purge cvmfs sudo rm -rf /etc/cvmfs/ git clone https://github.com/PawseySC/Pawsey-CernVM-FS.git cd Pawsey-CernVM-FS sudo ./install-cvmfs.sh install
Then, to access and view the entire list of Biocontainer tools from the repository:
Note: It may take a minute or two to load the folders. When you have done it once, it will not take as long to show again.
The /cvmfs/singularity.galaxyproject.org/all
subdirectory is where the entire list of 8000+ Biocontainers can be found, with the alphabetical subdirectories being symlinks to them.
$ ls -la /cvmfs/singularity.galaxyproject.org total 140 drwxr-xr-x 33 cvmfs cvmfs 4096 Mar 25 2020 . drwxr-xr-x 3 cvmfs cvmfs 4096 May 12 2022 1 drwxr-xr-x 3 cvmfs cvmfs 4096 Jun 29 2020 2 drwxr-xr-x 4 cvmfs cvmfs 4096 Feb 10 07:09 3 drwxr-xr-x 24 cvmfs cvmfs 4096 Feb 23 2021 a drwxr-xr-x 4 cvmfs cvmfs 4096 Mar 7 19:06 all drwxr-xr-x 23 cvmfs cvmfs 4096 Sep 2 2022 b drwxr-xr-x 26 cvmfs cvmfs 4096 Feb 10 15:05 c drwxr-xr-x 22 cvmfs cvmfs 4096 Feb 10 15:05 d drwxr-xr-x 23 cvmfs cvmfs 4096 May 1 2020 e drwxr-xr-x 21 cvmfs cvmfs 4096 Feb 10 19:06 f drwxr-xr-x 25 cvmfs cvmfs 4096 Feb 5 2022 g drwxr-xr-x 19 cvmfs cvmfs 4096 Feb 10 21:21 h drwxr-xr-x 18 cvmfs cvmfs 4096 Feb 10 21:21 i drwxr-xr-x 14 cvmfs cvmfs 4096 Jul 10 2020 j drwxr-xr-x 21 cvmfs cvmfs 4096 Feb 25 2021 k drwxr-xr-x 16 cvmfs cvmfs 4096 Feb 10 22:54 l drwxr-xr-x 26 cvmfs cvmfs 4096 Feb 10 22:54 m drwxr-xr-x 20 cvmfs cvmfs 4096 Feb 11 05:13 n drwxr-xr-x 14 cvmfs cvmfs 4096 May 7 2020 o drwxr-xr-x 24 cvmfs cvmfs 4096 Aug 24 2021 p drwxr-xr-x 12 cvmfs cvmfs 4096 Feb 27 2021 q drwxr-xr-x 27 cvmfs cvmfs 4096 Jun 28 2022 r drwxr-xr-x 26 cvmfs cvmfs 4096 Apr 7 2020 s drwxr-xr-x 24 cvmfs cvmfs 4096 Feb 11 23:17 t drwxr-xr-x 13 cvmfs cvmfs 4096 Feb 11 23:17 u drwxr-xr-x 18 cvmfs cvmfs 4096 Feb 11 23:17 v drwxr-xr-x 17 cvmfs cvmfs 4096 Jul 28 2021 w drwxr-xr-x 16 cvmfs cvmfs 4096 Apr 7 2020 x drwxr-xr-x 4 cvmfs cvmfs 4096 Feb 28 2021 y drwxr-xr-x 10 cvmfs cvmfs 4096 Feb 11 23:17 z
To access the data files:
$ ls -la /cvmfs/data.galaxyproject.org/ total 14 drwxr-xr-x 4 cvmfs cvmfs 4096 Mar 31 2016 . -rw-r--r-- 1 cvmfs cvmfs 21 Oct 24 2018 .cvmfsdirtab drwxr-xr-x 210 cvmfs cvmfs 4096 Apr 21 2022 byhand drwxr-xr-x 18 cvmfs cvmfs 4096 Nov 24 2020 managed
To use these data files for your analyses, copy the absolute file path in your workflow/pipeline. For example, with the reference genome Hg38
, the file can be found in the following location, and specifically under the seq
sub directory:
$ ls -la /cvmfs/data.galaxyproject.org/byhand/hg38 total 46 drwxrwxr-x 10 cvmfs cvmfs 4096 Apr 22 2016 . drwxr-xr-x 210 cvmfs cvmfs 4096 Apr 21 2022 .. -rw-r--r-- 1 cvmfs cvmfs 0 Apr 22 2016 .cvmfscatalog drwxrwxr-x 3 cvmfs cvmfs 4096 Jan 21 2015 download drwxrwxr-x 6 cvmfs cvmfs 4096 Jan 20 2015 hg38canon drwxrwxr-x 6 cvmfs cvmfs 4096 Jan 20 2015 hg38female drwxrwxr-x 6 cvmfs cvmfs 4096 Jan 20 2015 hg38full drwxrwxr-x 2 cvmfs cvmfs 4096 Mar 18 2014 liftOver drwxrwxr-x 2 cvmfs cvmfs 4096 Mar 18 2014 picard_index drwxrwxr-x 2 cvmfs cvmfs 4096 Mar 18 2014 sam_index drwxrwxr-x 2 cvmfs cvmfs 4096 Apr 1 2016 seq $ ls -la /cvmfs/data.galaxyproject.org/byhand/hg38/seq total 10108046 drwxrwxr-x 2 cvmfs cvmfs 4096 Apr 1 2016 . drwxrwxr-x 10 cvmfs cvmfs 4096 Apr 22 2016 .. -rw-rw-r-- 1 cvmfs cvmfs 136 Mar 18 2014 README -rw-rw-r-- 1 cvmfs cvmfs 835393456 Mar 18 2014 hg38.2bit lrwxrwxrwx 1 cvmfs cvmfs 11 May 17 2014 hg38.fa -> hg38full.fa -rw-r--r-- 1 cvmfs cvmfs 19327 Aug 24 2015 hg38.fa.fai -rw-rw-r-- 1 cvmfs cvmfs 3150052305 Mar 17 2014 hg38canon.fa -rw-rw-r-- 1 cvmfs cvmfs 3091680335 Mar 17 2014 hg38female.fa -rw-r--r-- 1 cvmfs cvmfs 757 Apr 1 2016 hg38female.fa.fai -rw-rw-r-- 1 cvmfs cvmfs 3273481150 Mar 18 2014 hg38full.fa
So the full absolute path for the Hg38
sequence file would be:
/cvmfs/data.galaxyproject.org/byhand/hg38/seq/hg38full.fa
If you run into any errors with accessing the file system, run the following to re-install it:
sudo apt-get autoremove cvmfs sudo apt-get purge cvmfs sudo rm -rf /etc/cvmfs/ git clone https://github.com/PawseySC/Pawsey-CernVM-FS.git cd Pawsey-CernVM-FS sudo ./install-cvmfs.sh install
Singularity-HPC
Our upcoming March 2023 update of the Pawsey Bio image will include integration of Singularity-HPC with our CVMFS repositories, so that you are able to make use of all 8000+ Biocontainers seamlessly.
Singularity-HPC is a software for container modules. If you are familiar with using containers, this is an added bonus to your experience in using containers. If you are not, this is a great way to start using containers. As container syntax can be messy and confusing, being able to use them as modules removes the need for using container syntaxes. Singularity-HPC was created by one of the original developers of Singularity, and the registry includes many bioinformatics containers that can be readily pulled and used.
Before you begin, ensure to move your containers folder to your storage volume (.i.e. /data), then update the container base path:
mv /home/ubuntu/containers /data/containers shpc config set container_base:/data/containers
To see the entire list of containers available on the registry, run the following command:
shpc list
At Pawsey, we recommend using S-HPC's registry of quay.io/biocontainers containers, as Biocontainers are a reliable source of well-built containers, with versions that are seamlessly matched to BioConda's tools. To narrow down the list to biocontainers, run:
shpc show -f quay.io/biocontainers
To install any of these containers, run:
shpc install quay.io/biocontainers/TOOL_NAME
To use installed containers, run the following and use the tool as you normally would (no container syntax required):
module load quay.io/biocontainers/TOOL_NAME
Notes:
- The list in the registry is not exhaustive, more packages are being added each day by the community
- Pawsey is working to add to the quay.io/biocontainers list successively
You may want to reclone the repo so that your list is always updated, as such:
git clone https://github.com/singularityhub/singularity-hpc sudo mv /singularity-hpc /shpc
- No labels