software containers in the form of container modules.
Prerequisites
Familiarity with:
SHPC has been configured by Pawsey staff to work out-of-the-box; the following aspects have been set up:
- Directory trees for installed containers and modules
- Default registry for installation recipes
- Use of Singularity as the container runtime
- Naming convention and features of generated modulefiles
- Configuration for MPI and GPU containers
SHPC is used by Pawsey staff to deploy some of the available scientific softwares, in particular bioinformatics applications.
Using SHPC
Installing a container for a software included in the SHPC registry requires no knowledge of containers and Singularity: all you need are the software name and version.
The key commands of SHPC are show
and install
; let's see them in action with an example. Suppose we want to install the bioinformatics package BWA. We can use the shpc show
command to browse the SHPC registry of available containers:
$ module load shpc/0.0.53 # load SHPC module
$ shpc show -f bwa # search for a package in SHPC registry (string search)
biocontainers/bwa
ghcr.io/autamus/bwa
$ shpc show biocontainers/bwa # inspect specific container recipe
docker: biocontainers/bwa
url: https://hub.docker.com/r/biocontainers/bwa
maintainer: '@vsoch'
description: BWA is a software package for mapping low-divergent sequences against
a large reference genome, such as the human genome.
latest:
0.7.15: sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
tags:
0.7.15: sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
v0.7.17_cv1: sha256:9479b73e108ded3c12cb88bb4e918a5bf720d7861d6d8cdbb46d78a972b6ff1b
aliases:
bwa: /opt/conda/bin/bwa
The information of interest in this output is the list of available versions (or tags), in this case: 0.7.15
and v0.7.17_cv1
. Let's install the former:
$ shpc install biocontainers/bwa:0.7.15
singularity pull --name /software/projectcode/rsrchr/shpc/containers/biocontainers/bwa/0.7.15/biocontainers-bwa-0.7.15-sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e.sif docker://biocontainers/bwa@sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
[..]
INFO: Creating SIF file...
/software/projects/projectcode/rsrchr/shpc/containers/biocontainers/bwa/0.7.15/biocontainers-bwa-0.7.15-sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e.sif
Module biocontainers/bwa:0.7.15 was created.
That's it! When modulefile paths are configured appropriately in the system (shell logout/login may be required the first time), you are able to use module avail
, module load
, and module unload
: (as these are system modules, note the slash "/
" for the version, instead of the colon ":
" above for the tags):
$ module avail bwa # search module
-------------------------------------------------------- /software/projectcode/rsrchr/shpc/modules ---------------------------------------------------------
biocontainers/bwa/0.7.15/module
$ module load biocontainers/bwa/0.7.15 # load module
$ bwa # test command
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.15-r1140
Contact: Heng Li <lh3@sanger.ac.uk>
Usage: bwa <command> [options]
Command: index index sequences in the FASTA format
mem BWA-MEM algorithm
fastmap identify super-maximal exact matches
pemerge merge overlapping paired ends (EXPERIMENTAL)
aln gapped/ungapped alignment
samse generate alignment (single ended)
sampe generate alignment (paired ended)
bwasw BWA-SW for long queries
shm manage indices in shared memory
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
bwt2sa generate SA from BWT and Occ
Note: To use BWA, you need to first index the genome with `bwa index'.
There are three alignment algorithms in BWA: `mem', `bwasw', and
`aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
first. Please `man ./bwa.1' for the manual.
The full list of SHPC commands can be shown by using one of the help commands:
$ shpc -h
$ shpc <subcommand> -h
Writing an SHPC container recipe
What if a software container is not in the SHPC registry? In this case, you can either write your own container recipe (see terminal 5), or email the Pawsey helpdesk for help.
Suppose you want to install the bioinformatics tool Velvet. For the sake of this example, we know already that there's a container available for Velvet at quay.io/biocontainers/velvet (external site).
$ module load shpc/0.0.53 # load SHPC module
$ shpc show -f velvet
$
As you can see from the empty output, there's no pre-existing entry in the SHPC Container Registry.
Let's see how to create one; in practice, we need to create a YAML container recipe inside the registry tree of SHPC. First, let's get the location of the registry, and then create an appropriate directory structure using the known container repo quay.io/biocontainers/velvet that was postulated above.
$ shpc config get registry # get registry location
registry /software/projects/projectcode/rsrchr/shpc/registry
# create directory tree for desired Velvet container recipe
$ mkdir -p /software/projects/projectcode/rsrchr/shpc/registry/quay.io/biocontainers/velvet
# create a new YAML container recipe in the new path (using vi as text editor here)
$ vi /software/projects/projectcode/rsrchr/shpc/registry/quay.io/biocontainers/velvet/container.yaml
Let's see how a possible recipe for Velvet might look:
docker: quay.io/biocontainers/velvet
latest:
"1.2.10--h5bf99c6_4": "sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27"
tags:
"1.2.10--h5bf99c6_4": "sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27"
"1.2.10--hed695b0_3": "sha256:b17fd98d802c1e78dde5fd2c5431efc1969db35a279f3a5ca7afcb46efc66e4a"
maintainer: "@marcodelapierre"
# these are optional
description: "Velvet is a sequence assembler for short reads."
url: https://quay.io/repository/biocontainers/velvet
aliases:
velvetg: /usr/local/bin/velvetg
velveth: /usr/local/bin/velveth
Let's comment on the key components of this YAML file:
docker
is the repository path for the container, without version tagstags
is a list of container tags (versions) with the corresponding SHA message digest (shasum); these need to be manually collected from the repository website, in this case https://quay.io/repository/biocontainers/velvet?tab=tags latest
is a copy-paste of the tag from above, to be used as "latest" versionmaintainer
is the Github username of the creator (required to contribute the recipe back to the Github repository of SHPC; put any name if you don't have one)aliases
is a list of command names that will be made available by the SHPC module, with the corresponding commands from inside the container; these need to be manually provided, either by reading through the documentation of the package, or by downloading and inspecting the container
Does this recipe work? Let's give it a go!
$ shpc show -f velvet # can SHPC locate the new recipe? yes!
quay.io/biocontainers/velvet
$ shpc install quay.io/biocontainers/velvet:1.2.10--h5bf99c6_4 # installing Velvet
singularity pull --name /software/projectcode/rsrchr/shpc/containers/quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4/quay.io-biocontainers-velvet-1.2.10--h5bf99c6_4-sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27.sif docker://quay.io/biocontainers/velvet@sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
[..]
INFO: Creating SIF file...
/software/projects/projectcode/rsrchr/shpc/containers/quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4/quay.io-biocontainers-velvet-1.2.10--h5bf99c6_4-sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27.sif
Module quay.io/biocontainers/velvet:1.2.10--h5bf99c6_4 was created.
$ quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4 # loading module
$ velvetg --help # testing a command
Usage:
./velvetg directory [options]
directory : working directory name
Standard options:
-cov_cutoff <floating-point|auto> : removal of low coverage nodes AFTER tour bus or allow the system to infer it
(default: no removal)
-ins_length <integer> : expected distance between two paired end reads (default: no read pairing)
-read_trkg <yes|no> : tracking of short read positions in assembly (default: no tracking)
-min_contig_lgth <integer> : minimum contig length exported to contigs.fa file (default: hash length * 2)
-amos_file <yes|no> : export assembly to AMOS file (default: no export)
-exp_cov <floating point|auto> : expected coverage of unique regions or allow the system to infer it
(default: no long or paired-end read resolution)
-long_cov_cutoff <floating-point>: removal of nodes with low long-read coverage AFTER tour bus
(default: no removal)
Advanced options:
-ins_length* <integer> : expected distance between two paired-end reads in the respective short-read dataset (default: no read pairing)
-ins_length_long <integer> : expected distance between two long paired-end reads (default: no read pairing)
-ins_length*_sd <integer> : est. standard deviation of respective dataset (default: 10% of corresponding length)
[replace '*' by nothing, '2' or '_long' as necessary]
-scaffolding <yes|no> : scaffolding of contigs used paired end information (default: on)
-max_branch_length <integer> : maximum length in base pair of bubble (default: 100)
-max_divergence <floating-point>: maximum divergence rate between two branches in a bubble (default: 0.2)
-max_gap_count <integer> : maximum number of gaps allowed in the alignment of the two branches of a bubble (default: 3)
-min_pair_count <integer> : minimum number of paired end connections to justify the scaffolding of two long contigs (default: 5)
-max_coverage <floating point> : removal of high coverage nodes AFTER tour bus (default: no removal)
-coverage_mask <int> : minimum coverage required for confident regions of contigs (default: 1)
-long_mult_cutoff <int> : minimum number of long reads required to merge contigs (default: 2)
-unused_reads <yes|no> : export unused reads in UnusedReads.fa file (default: no)
-alignments <yes|no> : export a summary of contig alignment to the reference sequences (default: no)
-exportFiltered <yes|no> : export the long nodes which were eliminated by the coverage filters (default: no)
-clean <yes|no> : remove all the intermediary files which are useless for recalculation (default : no)
-very_clean <yes|no> : remove all the intermediary files (no recalculation possible) (default: no)
-paired_exp_fraction <double> : remove all the paired end connections which less than the specified fraction of the expected count (default: 0.1)
-shortMatePaired* <yes|no> : for mate-pair libraries, indicate that the library might be contaminated with paired-end reads (default no)
-conserveLong <yes|no> : preserve sequences with long reads in them (default no)
Output:
directory/contigs.fa : fasta file of contigs longer than twice hash length
directory/stats.txt : stats file (tab-spaced) useful for determining appropriate coverage cutoff
directory/LastGraph : special formatted file with all the information on the final graph
directory/velvet_asm.afg : (if requested) AMOS compatible assembly file
Related pages
External links