Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Panel
titleThis page:

Table of Contents

Summary

This page covers information on how to use the new 'Pawsey Bio - Ubuntu 20.04 - 2021-11' image for Nimbus. Instructions on how to choose this image when creating your instance can be found here. This Bio-image is created to cater to bioinformatics users who prefer to have their instances pre-installed with software commonly used in the bioinformatics domain. Some of the software are part of Pawsey's ongoing effort to improve the experience of bioinformatics users at Pawsey.

Commonly used software

SoftwareInformationAnsibleAn automation platform that Pawsey uses to automate a number of software deploymentCernVM-FSA read-only file system for accessing files on shared repositories

Docker

A popular container engineJupyter NotebookFor using an interactive Jupyter Notebook - see Run Jupyter Notebook InteractivelyLmodA modules environment that we use at Pawsey for loading sotwareNextflowA popular workflow managerPipA Python package installer Python3A popular programming language used in many Bioinformatics software

RStudio

For using R interactively - see Run RStudio InteractivelySingularityA popular container engine that can be used on HPCSingularity-HPCA container modules installerSpackA package management toolX2go

A virtual desktop application - see Setting up a virtual desktop for your instance

Instructions

On this page, we will cover instructions for how to use CernVM-FS and Singularity-HPC. For instructions for other software listed above, please see the links provided above, or the software's original documentation page.

CernVM-FS

CernVM-FS is a read-only file system that was developed by another supercomputing centre (Cern). It allows files such as container tools, reference datasets and other shared resources that are commonly used by many researchers to be accessed, added to, and updated in the one location. At Pawsey, we currently cache the Biocontainer tools and reference genome datasets that are on Galaxy Project's repositories. Please note that the data sets may not be comprehensive, and this service is not meant to replace your current methods for accessing public datasets.

Note

If you are using the 'Pawsey Bio - Ubuntu 20.04 - 2021-11' image, run the following to re-install CVMFS:

Code Block
sudo apt-get autoremove cvmfs
sudo apt-get purge cvmfs
sudo rm -rf /etc/cvmfs/
git clone https://github.com/PawseySC/Pawsey-CernVM-FS.git
cd Pawsey-CernVM-FS
sudo ./install-cvmfs.sh install

Then, to access and view the entire list of Biocontainer tools from the repository: 

Note: It may take a minute or two to load the folders. When you have done it once, it will not take as long to show again.

Tip
The /cvmfs/singularity.galaxyproject.org/all subdirectory is where the entire list of 8000+ Biocontainers can be found, with the alphabetical subdirectories being symlinks to them.
$ ls -la /cvmfs/singularity.galaxyproject.org total 140 drwxr-xr-x 33 cvmfs cvmfs 4096 Mar 25 2020 . drwxr-xr-x 3 cvmfs cvmfs 4096 May 12 2022 1 drwxr-xr-x 3 cvmfs cvmfs 4096 Jun 29 2020 2 drwxr-xr-x 4
Code Block
Panel
titleThis page:

Table of Contents

Summary


This page covers information on how to use the 'Pawsey Bio - Ubuntu 22.04 - 2023-XX' image for Nimbus. Instructions on how to choose this image when creating your instance can be found here. This Bio-image is created to cater to bioinformatics users who prefer to have their instances pre-installed with software, tools and datasets commonly used in the bioinformatics domain, including over 8000 Biocontainer tools.

Commonly used software


SoftwareInformationUsage/Notes
AnsibleAn automation platform that Pawsey uses to automate a number of software deployment
CernVM-FSA read-only file system for accessing files on shared repositoriesSee 1. Biocontainers and Reference Genome data

Docker

A popular container engine
Google ChromeA web browserMake sure to SSH in to your instance with X11 forwarding, i.e. ssh -X (or -Y) ubuntu@146.XX.XXX, and have XQuart installed if you are using MacOS
Jupyter NotebookFor using an interactive Jupyter NotebookSee Run Jupyter Notebook Interactively
LmodA modules environment that we use at Pawsey for loading sotware
NextflowA popular workflow manager - "Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages."Nextflow can leverage the containers from /cvmfs/singularity.galaxyproject.org or container modules - See 4. Using Nextflow
PipA Python package installer 
Python3A popular programming language used in many Bioinformatics software

RStudio

For using R interactively See Run RStudio Interactively
SingularityA popular container engine that can be used on HPC
Singularity-HPCA container modules installerSee 2. Using Biocontainers
SpackA package management tool


X2go

A virtual desktop application

X2go server has been pre-installed on the image. To use it, you will need to install X2go client on your local machine - see Installing X2go Client

Instructions


On this page, we will only cover instructions for how to use access and use Biocontainers and reference genome data sets. For instructions for the software listed above, please see the software's original documentation page.

1. Biocontainers and Reference Genome data


CernVM-FS is a read-only file system that was developed by another supercomputing centre (Cern). It allows files such as container tools, reference datasets and other shared resources that are commonly used by many researchers to be accessed, added to, and updated in the one location. At Pawsey, we currently cache the Biocontainer tools and reference genome datasets that are on Galaxy Project's repositories. The list of Biocontainer tools available can be searched on https://biocontainers.pro/registry

To use the Biocontainer tools, you can skip this step and proceed to the next section, 2. Using the Biocontainer tools.

To view these repositories, you can do the following:

  1. List the Biocontainer tools repository:

    Note
    Note: It may take a minute or two to load the folders. When you have done it once, it will not take as long to show again. 


    Tip
    The /cvmfs/singularity.galaxyproject.org/all subdirectory is where the entire list of 8000+ Biocontainers can be found, with the alphabetical subdirectories being symlinks to them.


    Code Block
    $ ls -la /cvmfs/singularity.galaxyproject.org
    
    total 140
    drwxr-xr-x 33 cvmfs cvmfs 4096 Mar 25  2020 .
    drwxr-xr-x  3 cvmfs cvmfs 4096 May 12  2022 1
    drwxr-xr-x  3 cvmfs cvmfs 4096 Jun 29  2020 2
    drwxr-xr-x  4 cvmfs cvmfs 4096 Feb 10 07:09 3
    drwxr-xr-x 24 cvmfs cvmfs 4096 Feb 23  2021 a
    drwxr-xr-x  4 cvmfs cvmfs 4096 Mar  7 19:06 all
    drwxr-xr-x 23 cvmfs cvmfs 4096 Sep  2  2022 b
    drwxr-xr-x 26 cvmfs cvmfs 4096 Feb 10 15:05 c
    drwxr-xr-x 22 cvmfs cvmfs 4096 Feb 10 
07
  1. 15:
09
  1. 05 
3
  1. d
    drwxr-xr-x 
24
  1. 23 cvmfs cvmfs 4096 
Feb
  1. May 
23
  1.  1 
2021
  1.  2020 
a
  1. e
    drwxr-xr-x 
4
  1. 21 cvmfs cvmfs 4096 
Mar
  1. Feb 10 
7
  1. 19:06 
all
  1. f
    drwxr-xr-x 
23
  1. 25 cvmfs cvmfs 4096 
Sep
  1. Feb  
2
  1. 5  2022 
b
  1. g
    drwxr-xr-x 
26
  1. 19 cvmfs cvmfs 4096 Feb 10 
15
  1. 21:
05
  1. 21 
c
  1. h
    drwxr-xr-x 
22
  1. 18 cvmfs cvmfs 4096 Feb 10 
15
  1. 21:
05
  1. 21 
d
  1. i
    drwxr-xr-x 
23
  1. 14 cvmfs cvmfs 4096 
May
  1. Jul 10 
1
  1.  2020 
e
  1. j
    drwxr-xr-x 21 cvmfs cvmfs 4096 Feb 25 
10
  1.  
19:06
  1. 2021 
f
  1. k
    drwxr-xr-x 
25
  1. 16 cvmfs cvmfs 4096 Feb 10 
5 2022 g
  1. 22:54 l
    drwxr-xr-x 
19
  1. 26 cvmfs cvmfs 4096 Feb 10 
21
  1. 22:
21
  1. 54 
h
  1. m
    drwxr-xr-x 
18
  1. 20 cvmfs cvmfs 4096 Feb 
10
  1. 11 
21
  1. 05:
21
  1. 13 
i
  1. n
    drwxr-xr-x 14 cvmfs cvmfs 4096 
Jul
  1. May  
10
  1. 7  2020 
j
  1. o
    drwxr-xr-x 
21
  1. 24 cvmfs cvmfs 4096 
Feb
  1. Aug 
25
  1. 24  2021 
k
  1. p
    drwxr-xr-x 
16
  1. 12 cvmfs cvmfs 4096 Feb 
10 22:54 l
  1. 27  2021 q
    drwxr-xr-x 
26
  1. 27 cvmfs cvmfs 4096 Jun 
Feb
  1. 28 
10
  1.  
22:54
  1. 2022 
m
  1. r
    drwxr-xr-x 
20
  1. 26 cvmfs cvmfs 4096 
Feb
  1. Apr 
11
  1.  
05:13
  1. 7 
n
  1.  
drwxr
  1. 2020 s
    drwxr-xr-x 
14
  1. 24 cvmfs cvmfs 4096 
May
  1. Feb 11 
7
  1. 23:17 t
    
2020 o
  1. drwxr-xr-x 
24
  1. 13 cvmfs cvmfs 4096 
Aug
  1. Feb 
24
  1. 11 
2021 p
  1. 23:17 u
    drwxr-xr-x 
12
  1. 18 cvmfs cvmfs 4096 Feb 
27
  1. 11 23:17 
2021
  1. v
    
q
  1. drwxr-xr-x 
27
  1. 17 cvmfs cvmfs 4096 
Jun
  1. Jul 28  
2022
  1. 2021 
r
  1. w
    drwxr-xr-x 
26
  1. 16 cvmfs cvmfs 4096 Apr  7  2020 
s
  1. x
    drwxr-xr-x 
24
  1.  4 cvmfs cvmfs 4096 Feb 
11 23:17 t
  1. 28  2021 y
    drwxr-xr-x 
13
  1. 10 cvmfs cvmfs 4096 Feb 11 23:17 
u drwxr-xr-x 18 cvmfs cvmfs 4096 Feb 11 23:17 v
  1. z


  2. List the reference genome sets and other data files:

    Code Block
    $ ls -la /cvmfs/data.galaxyproject.org/
    
    total 14
    drwxr-xr-x   
17
  1. 4 cvmfs cvmfs 4096 
Jul
  1. Mar 
28
  1. 31  
2021
  1. 2016 
w drwxr-xr-x 16
  1. .
    -rw-r--r--   1 cvmfs cvmfs 
4096
  1.   
Apr
  1. 21 Oct 
7
  1. 24  
2020
  1. 2018 
x
  1. .cvmfsdirtab
    drwxr-xr-x 
4
  1. 210 cvmfs cvmfs 4096 
Feb
  1. Apr 
28
  1. 21  
2021
  1. 2022 
y
  1. byhand
    drwxr-xr-x 
10
  1.  18 cvmfs cvmfs 4096 Nov 
Feb
  1. 24 
11
  1.  
23:17 z

To access the data files:

Code Block
$ ls -la /cvmfs/data.galaxyproject.org/
total 14
drwxr-xr-x   4 cvmfs cvmfs 4096 Mar 31  2016 .
-rw-r--r--   1 cvmfs cvmfs   21 Oct 24  2018 .cvmfsdirtab
drwxr-xr-x 210 cvmfs cvmfs 4096 Apr 21  2022 byhand
drwxr-xr-x  18 cvmfs cvmfs 4096 Nov 24  2020 managed

To use these data files for your analyses, copy the absolute file path in your workflow/pipeline. For example, with the reference genome Hg38, the file can be found in the following location, and specifically under the seq sub directory:

Code Block
$ ls -la /cvmfs/data.galaxyproject.org/byhand/hg38
total 46
drwxrwxr-x  10 cvmfs cvmfs 4096 Apr 22  2016 .
drwxr-xr-x 210 cvmfs cvmfs 4096 Apr 21  2022 ..
-rw-r--r--   1 cvmfs cvmfs    0 Apr 22  2016 .cvmfscatalog
drwxrwxr-x   3 cvmfs cvmfs 4096 Jan 21  2015 download
drwxrwxr-x   6 cvmfs cvmfs 4096 Jan 20  2015 hg38canon
drwxrwxr-x   6 cvmfs cvmfs 4096 Jan 20  2015 hg38female
drwxrwxr-x   6 cvmfs cvmfs 4096 Jan 20  2015 hg38full
drwxrwxr-x   2 cvmfs cvmfs 4096 Mar 18  2014 liftOver
drwxrwxr-x   2 cvmfs cvmfs 4096 Mar 18  2014 picard_index
drwxrwxr-x   2 cvmfs cvmfs 4096 Mar 18  2014 sam_index
drwxrwxr-x   2 cvmfs cvmfs 4096 Apr  1  2016 seq


$ ls -la /cvmfs/data.galaxyproject.org/byhand/hg38/seq
total 10108046
drwxrwxr-x  2 cvmfs cvmfs       4096 Apr  1  2016 .
drwxrwxr-x 10 cvmfs cvmfs       4096 Apr 22  2016 ..
-rw-rw-r--  1 cvmfs cvmfs        136 Mar 18  2014 README
-rw-rw-r--  1 cvmfs cvmfs  835393456 Mar 18  2014 hg38.2bit
lrwxrwxrwx  1 cvmfs cvmfs         11 May 17  2014 hg38.fa -> hg38full.fa
-rw-r--r--  1 cvmfs cvmfs      19327 Aug 24  2015 hg38.fa.fai
-rw-rw-r--  1 cvmfs cvmfs 3150052305 Mar 17  2014 hg38canon.fa
-rw-rw-r--  1 cvmfs cvmfs 3091680335 Mar 17  2014 hg38female.fa
-rw-r--r--  1 cvmfs cvmfs        757 Apr  1  2016 hg38female.fa.fai
-rw-rw-r--  1 cvmfs cvmfs 3273481150 Mar 18  2014 hg38full.fa

So the full absolute path for the Hg38 sequence file would be:

Code Block
/cvmfs/data.galaxyproject.org/byhand/hg38/seq/hg38full.fa
Note

If you run into any errors with accessing the file system, run the following to re-install it:

Code Block
sudo apt-get autoremove cvmfs
sudo apt-get purge cvmfs
sudo rm -rf /etc/cvmfs/
git clone https://github.com/PawseySC/Pawsey-CernVM-FS.git
cd Pawsey-CernVM-FS
sudo ./install-cvmfs.sh install

Singularity-HPC

Tip

Our upcoming March 2023 update of the Pawsey Bio image will include integration of Singularity-HPC with our CVMFS repositories, so that you are able to make use of all 8000+ Biocontainers seamlessly.

Singularity-HPC is a software for container modules. If you are familiar with using containers, this is an added bonus to your experience in using containers. If you are not, this is a great way to start using containers. As container syntax can be messy and confusing, being able to use them as modules removes the need for using container syntaxes. Singularity-HPC was created by one of the original developers of Singularity, and the registry includes many bioinformatics containers that can be readily pulled and used. 

Before you begin, ensure to move your containers folder to your storage volume (.i.e. /data), then update the container base path:

Code Block
mv /home/ubuntu/containers /data/containers
shpc config set container_base:/data/containers

To see the entire list of containers available on the registry, run the following command:

Code Block
shpc list 

At Pawsey, we recommend using S-HPC's registry of quay.io/biocontainers containers, as Biocontainers are a reliable source of well-built containers, with versions that are seamlessly matched to BioConda's tools. To narrow down the list to biocontainers, run:

Code Block
shpc show -f quay.io/biocontainers

To install any of these containers, run:

Code Block
shpc install quay.io/biocontainers/TOOL_NAME

To use installed containers, run the following and use the tool as you normally would (no container syntax required):

Code Block
module load quay.io/biocontainers/TOOL_NAME

Notes:

  • The list in the registry is not exhaustive, more packages are being added each day by the community
  • Pawsey is working to add to the quay.io/biocontainers list successively
  • You may want to reclone the repo so that your list is always updated, as such:

    Code Blockgit clone https://github.com/singularityhub/singularity-hpc sudo mv /singularity-hpc /shpc
    1. 2020 managed

      Please note that the data sets may not be comprehensive, and this service is not meant to replace your current methods for accessing public datasets.

    2. To use these data files for your analyses, copy the absolute file path in your workflow/pipeline. For example, with the reference genome Hg38, the file can be found in the following location, and specifically under the seq sub directory:

      Code Block
      $ ls -la /cvmfs/data.galaxyproject.org/byhand/hg38
      
      total 46
      drwxrwxr-x  10 cvmfs cvmfs 4096 Apr 22  2016 .
      drwxr-xr-x 210 cvmfs cvmfs 4096 Apr 21  2022 ..
      -rw-r--r--   1 cvmfs cvmfs    0 Apr 22  2016 .cvmfscatalog
      drwxrwxr-x   3 cvmfs cvmfs 4096 Jan 21  2015 download
      drwxrwxr-x   6 cvmfs cvmfs 4096 Jan 20  2015 hg38canon
      drwxrwxr-x   6 cvmfs cvmfs 4096 Jan 20  2015 hg38female
      drwxrwxr-x   6 cvmfs cvmfs 4096 Jan 20  2015 hg38full
      drwxrwxr-x   2 cvmfs cvmfs 4096 Mar 18  2014 liftOver
      drwxrwxr-x   2 cvmfs cvmfs 4096 Mar 18  2014 picard_index
      drwxrwxr-x   2 cvmfs cvmfs 4096 Mar 18  2014 sam_index
      drwxrwxr-x   2 cvmfs cvmfs 4096 Apr  1  2016 seq
      
      
      $ ls -la /cvmfs/data.galaxyproject.org/byhand/hg38/seq
      
      total 10108046
      drwxrwxr-x  2 cvmfs cvmfs       4096 Apr  1  2016 .
      drwxrwxr-x 10 cvmfs cvmfs       4096 Apr 22  2016 ..
      -rw-rw-r--  1 cvmfs cvmfs        136 Mar 18  2014 README
      -rw-rw-r--  1 cvmfs cvmfs  835393456 Mar 18  2014 hg38.2bit
      lrwxrwxrwx  1 cvmfs cvmfs         11 May 17  2014 hg38.fa -> hg38full.fa
      -rw-r--r--  1 cvmfs cvmfs      19327 Aug 24  2015 hg38.fa.fai
      -rw-rw-r--  1 cvmfs cvmfs 3150052305 Mar 17  2014 hg38canon.fa
      -rw-rw-r--  1 cvmfs cvmfs 3091680335 Mar 17  2014 hg38female.fa
      -rw-r--r--  1 cvmfs cvmfs        757 Apr  1  2016 hg38female.fa.fai
      -rw-rw-r--  1 cvmfs cvmfs 3273481150 Mar 18  2014 hg38full.fa


    3. So the full absolute path for the Hg38 sequence file would be:

      Code Block
      /cvmfs/data.galaxyproject.org/byhand/hg38/seq/hg38full.fa


    Note

    If you run into any errors with accessing the file system, run the following to re-install it:

    Code Block
    sudo apt-get autoremove cvmfs
    sudo apt-get purge cvmfs
    sudo rm -rf /etc/cvmfs/ 
    cd /home/ubuntu
    git clone https://github.com/PawseySC/Pawsey-CernVM-FS.git
    cd Pawsey-CernVM-FS
    sudo ./install-cvmfs.sh install

    If it is still causing errors, you may need to reboot your instance.


    2. Using the Biocontainer tools 


    Singularity-HPC (SHPC) is a software for container modules. In this Pawsey Bio - Ubuntu 22.04 - 2023-XX image, we have integrated the use of SHPC seamlessly with CernVM-FS. This means that you can easily access and use over 8000 Biocontainers (and up to the latest versions) without needing to understand container syntax.

    Note

    If you are using the now deprecated 'Pawsey Bio - Ubuntu 20.04 - 2021-11' image, you will not be able to seamlessly use Biocontainer tools without first installing them using SHPC in the next section, 3. Adding a local SHPC registry. To avoid that, we recommend that you recreate a new instance with the 'Pawsey Bio - Ubuntu 22.04 - 2023-XX' image.


    Tip
    titleData directories

    When using Biocontainer tools, you will be required to export the paths for your data directory(ies) to Singularity, so that they can be readable by the container. For example, if your data directory is /data, then you would run the following to add it to the Singularity bind path:

    Code Block
    export SINGULARITY_BINDPATH=/data
    echo 'export SINGULARITY_BINDPATH=/data' >> ~/.bashrc



    1.  To search for versions and information on a particular tool, e.g. for cuttlefish, use shpc show:

      Code Block
      $ shpc show quay.io/biocontainers/cuttlefish
      
      url: https://biocontainers.pro/tools/cuttlefish
      maintainer: '@vsoch'
      description: shpc-registry automated BioContainers addition for cuttlefish
      latest:
        2.2.0--hf1761c0_0: sha256:63cdd7778b144a37684ae53b8e760ed00852f3010aa79292b3f1a6a6470f0992
      tags:
        2.1.0--hf1761c0_0: sha256:aa009abd48c372125e060d39f49f1690be74b6dac276d451bf1cc4c847a914d6
        2.1.1--hf1761c0_0: sha256:8bccced83dd6bbf87843cf08851563c812ef7c36afda2efbcf0d54f9102b913f
        2.2.0--hf1761c0_0: sha256:63cdd7778b144a37684ae53b8e760ed00852f3010aa79292b3f1a6a6470f0992
      docker: quay.io/biocontainers/cuttlefish
      aliases:
        cuttlefish: /usr/local/bin/cuttlefish


    2. To check for availability and to load the tool, use the module avail and module load commands:

      Note
      Note that on first use, the tool might take a 30 seconds or so to run the command as the container is being accessed from the filesystem for the first time


      Code Block
      $ module avail cuttlefish
      
      ------------------------------- /home/ubuntu/singularity-hpc/modules -------------------------------
         quay.io/biocontainers/cuttlefish/2.2.0--hf1761c0_0/module


      Code Block
      $ module load quay.io/biocontainers/cuttlefish/2.2.0--hf1761c0_0/module


      Code Block
      $ cuttlefish --version
      cuttlefish 2.2.0
      Supported commands: `build`, `help`, `version`.
      Usage:
      	cuttlefish build [options]


    3. To check for the list of modules loaded:

      Note
      This list will be cleared whenever you log out of your instance. After logging back in, you will need to reload the module for it to be on the list for use.


      Code Block
      $ module list 
      
      Currently Loaded Modules:
        1) quay.io/biocontainers/cuttlefish/2.2.0--hf1761c0_0/module 




    4. To install another version not available as a module:

      Expand
      titleExpand...
      1. To use another version of the tool available from the above shpc show list, use shpc to install the module, ensuring to use and keep the cvmfs path of the container:

        Code Block
        $ sudo shpc install quay.io/biocontainers/cuttlefish:2.1.1--hf1761c0_0 /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.1.1--hf1761c0_0 --keep-path 
        Module quay.io/biocontainers/cuttlefish:2.1.1--hf1761c0_0 was created.


      2. Update the modules cache, then, run the same module commands to load the tool in this version:

        Code Block
        $ /usr/local/lmod/lmod/libexec/update_lmod_system_cache_files -d /opt/mData/cacheDir -t /opt/mData/cacheTS.txt /home/ubuntu/singularity-hpc/modules



        Code Block
         $ module avail cuttlefish
        
        ------------------------------- /home/ubuntu/singularity-hpc/modules -------------------------------
           quay.io/biocontainers/cuttlefish/2.1.1--hf1761c0_0/module
           quay.io/biocontainers/cuttlefish/2.2.0--hf1761c0_0/module (L,D)
        
          Where:
           L:  Module is loaded
           D:  Default Module  


        Code Block
         $ module load quay.io/biocontainers/cuttlefish/2.1.1--hf1761c0_0/module
        
        The following have been reloaded with a version change:
          1) quay.io/biocontainers/cuttlefish/2.2.0--hf1761c0_0/module => quay.io/biocontainers/cuttlefish/2.1.1--hf1761c0_0/module 


         You will notice that the previous version of the tool is now swapped out for the version you just loaded

        Code Block
        $ module list   
        
        Currently Loaded Modules:
          1) quay.io/biocontainers/cuttlefish/2.1.1--hf1761c0_0/module



    If you prefer to use the biocontainers without SHPC, you can do so by using the absolute path for each of the biocontainers. Note that you would require knowledge on how to use Singularity to do so. The version of Singularity installed on the Nimbus Bio image is 3.8.7 and instructions can be found here: Singularity exec.

    For example, to use cuttlefish version 2.2.0–hf1761c0_0:

    Code Block
    $ ls /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.2.0*
    /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.2.0--h6a68c12_1
    /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.2.0--h6a68c12_2
    /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.2.0--hf1761c0_0
    /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.2.0--hf1761c0_1


    Code Block
    $ singularity exec /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.2.0--hf1761c0_0 cuttlefish
    
    cuttlefish 2.2.0
    Supported commands: `build`, `help`, `version`.
    Usage:
    	cuttlefish build [options]


    Note

    When using Singularity, if you run into an issue with no loop devices found, please use the solution provided here: Using Containers#Commonissues

    3. Adding a local SHPC registry


    If there are versions (usually older ones) of a Biocontainer tool that is present in the cvmfs repository but not on the shpc show list (i.e. the default recipe), you can create a local SHPC registry and add/update a recipe file for the Biocontainer tool.

    1. Clone the remote SHPC-registry and add it as a local registry:

      Code Block
      $ cd /home/ubuntu


      Code Block
      $ git clone https://github.com/singularityhub/shpc-registry.git


      Code Block
      $ sudo shpc config add registry /home/ubuntu/shpc-registry/
      Warning: Check with shpc config edit - ordering of list can change.
      Added registry to /home/ubuntu/shpc-registry/


    2. Look up for available versions of the tool, e.g. cuttlefish:

      Code Block
      $ ls /cvmfs/singularity.galaxyproject.org/all/cuttlefish*
      /cvmfs/singularity.galaxyproject.org/all/cuttlefish:1.0.0--h2e03b76_0
      /cvmfs/singularity.galaxyproject.org/all/cuttlefish:1.0.0--h2e03b76_1
      /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.0.0--h95f258a_0
      /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.0.0--hf1761c0_1
      /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.1.0--hf1761c0_0
      /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.1.1--hf1761c0_0
      /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.2.0--hf1761c0_0


    3. To add a different version to the recipe file for the tool, e.g. 2.0.0--hf1761c0_1:

      Code Block
      $ shpc add /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.0.0--hf1761c0_1 quay.io/biocontainers/cuttlefish:2.0.0--hf1761c0_1 --registry /home/ubuntu/shpc-registry/
      
      quay.io/biocontainers/cuttlefish:2.0.0--hf1761c0_1 already exists and will be updated!
      
      Registry entry quay.io/biocontainers/cuttlefish was added! Before shpc install, edit:
      /home/ubuntu/shpc-registry/quay.io/biocontainers/cuttlefish/container.yaml


    4. To install the tool:

      Warning

      If you are using the now deprecated 'Pawsey Bio - Ubuntu 20.04 - 2021-11' image, run the following steps first:

      Code Block
      cd /home/ubuntu
      
      git clone https://github.com/singularityhub/singularity-hpc.git
      
      mkdir /home/ubuntu/singularity-hpc/modules
      
      sudo shpc config set module_base /home/ubuntu/singularity-hpc/modules
      
      cat >> ~/.bashrc <<'EOF' 
      module use /home/ubuntu/singularity-hpc/modules
      EOF
      
      source ~/.bashrc

      By setting your module_base to this new location, all new container modules will be installed to this path.


      Code Block
      $ sudo shpc install quay.io/biocontainers/cuttlefish:2.0.0--hf1761c0_1 /cvmfs/singularity.galaxyproject.org/all/cuttlefish:2.0.0--hf1761c0_1 --keep-path
      Module biocontainers/cuttlefish:2.0.0--hf1761c0_1 was created.


    5. Now when you do a module avail, the newly installed 2.0.0--hf1761c0_1 version will be available:

      Code Block
      $ module avail cuttlefish  
      
      ------------------------------- /home/ubuntu/singularity-hpc/modules -------------------------------
         quay.io/biocontainers/cuttlefish/2.0.0--hf1761c0_1/module


    6. Since you have created your own local registry, shpc will default to your local registry whenever you do a look up with shpc show. To look up the full list of Biocontainer tools with the latest versions, you will need to add a flag to point to the remote (Github) shpc-registry in your search:

      Tip
      The shpc-registry is kept up-to-date with the latest versions of all Biocontainers on a nightly update.


      Code Block
      $ shpc show quay.io/biocontainers/cuttlefish --registry https://github.com/singularityhub/shpc-registry


    4. Using Nextflow


    Nextflow makes use of containers to run your workflows sequentially. Each step of your workflow is called a process. For each process, Nextflow pulls the appropriate container required for that step to run it. You can prevent Nextflow from pulling the container and using what is present on your instance to save time and space.

    To do so, you would create an additional config file to point Nextflow to either 1) the paths of the containers on /cvmfs/singularity.galaxyproject.org, or 2) the module paths on your instance. Nextflow prioritises this custom config file above the default nextflow.config file(s), if present, in other directories for your workflow.

    Nextflow pipelines for Bioinformatics

    Nextflow has a repository of pipelines that are available through https://nf-co.re. These are becoming increasingly popular, as more peer-reviewed pipelines are added by the community. A couple of popular ones include:

    Configuring Nextflow to use Biocontainers

    Please note this is only an example of how you can configure your Nextflow workflow to use the containers available from your instance.

    1. Suppose you are using the nfcore/rnaseq pipeline. Note that in the main.nf for the FASTQC process, there are a few parameters for the tool. These will be over-written by the config file that you will create in the next step.

      Code Block
      $ cat rnaseq/modules/nf-core/fastqc/main.nf
      
      process FASTQC {
          tag "$meta.id"
          label 'process_medium'
      
          conda "bioconda::fastqc=0.11.9"
          container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
              'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' :
              'quay.io/biocontainers/fastqc:0.11.9--0' }"
      
          input:
          tuple val(meta), path(reads)
      
          output:
          tuple val(meta), path("*.html"), emit: html
          tuple val(meta), path("*.zip") , emit: zip
          path  "versions.yml"           , emit: versions
      
          when:
          task.ext.when == null || task.ext.when
      
          script:
          def args = task.ext.args ?: ''
          def prefix = task.ext.prefix ?: "${meta.id}"
          // Make list of old name and new name pairs to use for renaming in the bash while loop
          def old_new_pairs = reads instanceof Path || reads.size() == 1 ? [[ reads, "${prefix}.${reads.extension}" ]] : reads.withIndex().collect { entry, index -> [ entry, "${prefix}_${index + 1}.${entry.extension}" ] }
          def rename_to = old_new_pairs*.join(' ').join(' ')
          def renamed_files = old_new_pairs.collect{ old_name, new_name -> new_name }.join(' ')
          """
          printf "%s %s\\n" $rename_to | while read old_name new_name; do
              [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
          done
          fastqc $args --threads $task.cpus $renamed_files
      
          cat <<-END_VERSIONS > versions.yml
          "${task.process}":
              fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
          END_VERSIONS
          """
      
          stub:
          def prefix = task.ext.prefix ?: "${meta.id}"
          """
          touch ${prefix}.html
          touch ${prefix}.zip
      
          cat <<-END_VERSIONS > versions.yml
          "${task.process}":
              fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
          END_VERSIONS
          """
      }


    2. Nextflow prioritises a custom config file over any other config files or values defined in the workflow files. To ensure that Nextflow uses the existing container for fastqc, you would create and use a custom config file, choosing either of the two ways:

      Ui tabs


      Ui tab
      titleCVMFS path

      Create a cvmfs_path.config file with the following:

      Then run the workflow with this config:



      Ui tab
      titleModules path

      Create a modules_path.config file with the following:

      Then run the workflow with this config:




      For the other processes in your workflow, you will need to also add to the corresponding *_path.config file to ensure that every process has the path for the existing container to run that part of the workflow from. More info on config files for processes can be found here: https://www.nextflow.io/docs/latest/config.html#scope-process

    Child pages (Children Display)
    pageCloud Documentation