Skip to end of banner
Go to start of banner

SHPC (Singularity Registry HPC)

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 2 Next »

SHPC is a utility that allows the installation of software containers in the form of container modules.

On this page:

Prerequisites

Familiarity with:

What is SHPC?

SHPC allows the installation of software containers in the form of so-called container modules, for transparent usage of containerised applications. An automated process generates a system module for an application, hiding the specificities of the Singularity syntax behind shell functions that take the same name as the corresponding executables.

For containerised applications that are already available in the SHPC registry, installing and using them via SHPC is much simpler than using Singularity itself. For applications that are not yet in the registry, writing a custom container recipe may still be faster than learning how to use Singularity.

SHPC at Pawsey

SHPC has been configured by Pawsey staff to work out-of-the-box; the following aspects have been set up:

  • Directory trees for installed containers and modules
  • Default registry for installation recipes
  • Use of Singularity as the container runtime
  • Naming convention and features of generated modulefiles
  • Configuration for MPI and GPU containers

SHPC is used by Pawsey staff to deploy some of the available scientific softwares, in particular bioinformatics applications.

Using SHPC

Installing a container for a software included in the SHPC registry requires no knowledge of containers and Singularity: all you need are the software name and version.

The key commands of SHPC are show and install; let's see them in action with an example. Suppose we want to install the bioinformatics package BWA. We can use the shpc show command to browse the SHPC registry of available containers:

Terminal 1. Example SHPC Show command
$ module load shpc/0.0.53  # load SHPC module

$ shpc show -f bwa  # search for a package in SHPC registry (string search)
biocontainers/bwa
ghcr.io/autamus/bwa

$ shpc show biocontainers/bwa  # inspect specific container recipe
docker: biocontainers/bwa
url: https://hub.docker.com/r/biocontainers/bwa
maintainer: '@vsoch'
description: BWA is a software package for mapping low-divergent sequences against
  a large reference genome, such as the human genome.
latest:
  0.7.15: sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
tags:
  0.7.15: sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
  v0.7.17_cv1: sha256:9479b73e108ded3c12cb88bb4e918a5bf720d7861d6d8cdbb46d78a972b6ff1b
aliases:
  bwa: /opt/conda/bin/bwa

The information of interest in this output is the list of available versions (or tags), in this case: 0.7.15 and v0.7.17_cv1. Let's install the former:

Terminal 2. Example SHPC Install command
$ shpc install biocontainers/bwa:0.7.15
singularity pull --name /software/projectcode/rsrchr/shpc/containers/biocontainers/bwa/0.7.15/biocontainers-bwa-0.7.15-sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e.sif docker://biocontainers/bwa@sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures

[..]

INFO:    Creating SIF file...
/software/projects/projectcode/rsrchr/shpc/containers/biocontainers/bwa/0.7.15/biocontainers-bwa-0.7.15-sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e.sif
Module biocontainers/bwa:0.7.15 was created.

That's it! When modulefile paths are configured appropriately in the system (shell logout/login may be required the first time), you are able to use module availmodule load, and module unload: (as these are system modules, note the slash "/" for the version, instead of the colon ":" above for the tags):

Terminal 3. Example SHPC module load
$ module avail bwa  # search module

-------------------------------------------------------- /software/projectcode/rsrchr/shpc/modules ---------------------------------------------------------
   biocontainers/bwa/0.7.15/module

$ module load biocontainers/bwa/0.7.15  # load module

$ bwa  # test command

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.15-r1140
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'.
      There are three alignment algorithms in BWA: `mem', `bwasw', and
      `aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
      first. Please `man ./bwa.1' for the manual.

The full list of SHPC commands can be shown by using one of the help commands:

$ shpc -h
$ shpc <subcommand> -h

Writing an SHPC container recipe

What if a software container is not in the SHPC registry? In this case, you can either write your own container recipe (see terminal 5), or email the Pawsey helpdesk for help.

Suppose you want to install the bioinformatics tool Velvet. For the sake of this example, we know already that there's a container available for Velvet at quay.io/biocontainers/velvet (external site).

Terminal 4. Velvet not on SHPC registry
$ module load shpc/0.0.53  # load SHPC module

$ shpc show -f velvet
$

As you can see from the empty output, there's no pre-existing entry in the SHPC Container Registry.

Let's see how to create one; in practice, we need to create a YAML container recipe inside the registry tree of SHPC. First, let's get the location of the registry, and then create an appropriate directory structure using the known container repo quay.io/biocontainers/velvet that was postulated above.

Terminal 5. Create SHPC container recipe for Velvet
$ shpc config get registry  # get registry location
registry                       /software/projects/projectcode/rsrchr/shpc/registry

# create directory tree for desired Velvet container recipe
$ mkdir -p /software/projects/projectcode/rsrchr/shpc/registry/quay.io/biocontainers/velvet

# create a new YAML container recipe in the new path (using vi as text editor here)
$ vi /software/projects/projectcode/rsrchr/shpc/registry/quay.io/biocontainers/velvet/container.yaml

Let's see how a possible recipe for Velvet might look:

Listing 1. Velvet container recipe YAML
docker: quay.io/biocontainers/velvet

latest:
  "1.2.10--h5bf99c6_4": "sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27"
tags:
  "1.2.10--h5bf99c6_4": "sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27"
  "1.2.10--hed695b0_3": "sha256:b17fd98d802c1e78dde5fd2c5431efc1969db35a279f3a5ca7afcb46efc66e4a"

maintainer: "@marcodelapierre"

# these are optional
description: "Velvet is a sequence assembler for short reads."
url: https://quay.io/repository/biocontainers/velvet

aliases:
  velvetg: /usr/local/bin/velvetg
  velveth: /usr/local/bin/velveth

Let's comment on the key components of this YAML file:

  • docker is the repository path for the container, without version tags
  • tags is a list of container tags (versions) with the corresponding SHA message digest (shasum); these need to be manually collected from the repository website, in this case https://quay.io/repository/biocontainers/velvet?tab=tags 
  • latest is a copy-paste of the tag from above, to be used as "latest" version
  • maintainer is the Github username of the creator (required to contribute the recipe back to the Github repository of SHPC; put any name if you don't have one)
  • aliases is a list of command names that will be made available by the SHPC module, with the corresponding commands from inside the container; these need to be manually provided, either by reading through the documentation of the package, or by downloading and inspecting the container

Does this recipe work? Let's give it a go!

Terminal 6. Test new SHPC container recipe for Velvet
$ shpc show -f velvet  # can SHPC locate the new recipe? yes!
quay.io/biocontainers/velvet

$ shpc install quay.io/biocontainers/velvet:1.2.10--h5bf99c6_4  # installing Velvet
singularity pull --name /software/projectcode/rsrchr/shpc/containers/quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4/quay.io-biocontainers-velvet-1.2.10--h5bf99c6_4-sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27.sif docker://quay.io/biocontainers/velvet@sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures

[..]

INFO:    Creating SIF file...
/software/projects/projectcode/rsrchr/shpc/containers/quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4/quay.io-biocontainers-velvet-1.2.10--h5bf99c6_4-sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27.sif
Module quay.io/biocontainers/velvet:1.2.10--h5bf99c6_4 was created.

$ quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4  # loading module

$ velvetg --help  # testing a command
Usage:
./velvetg directory [options]

	directory			: working directory name

Standard options:
	-cov_cutoff <floating-point|auto>	: removal of low coverage nodes AFTER tour bus or allow the system to infer it
		(default: no removal)
	-ins_length <integer>		: expected distance between two paired end reads (default: no read pairing)
	-read_trkg <yes|no>		: tracking of short read positions in assembly (default: no tracking)
	-min_contig_lgth <integer>	: minimum contig length exported to contigs.fa file (default: hash length * 2)
	-amos_file <yes|no>		: export assembly to AMOS file (default: no export)
	-exp_cov <floating point|auto>	: expected coverage of unique regions or allow the system to infer it
		(default: no long or paired-end read resolution)
	-long_cov_cutoff <floating-point>: removal of nodes with low long-read coverage AFTER tour bus
		(default: no removal)

Advanced options:
	-ins_length* <integer>		: expected distance between two paired-end reads in the respective short-read dataset (default: no read pairing)
	-ins_length_long <integer>	: expected distance between two long paired-end reads (default: no read pairing)
	-ins_length*_sd <integer>	: est. standard deviation of respective dataset (default: 10% of corresponding length)
		[replace '*' by nothing, '2' or '_long' as necessary]
	-scaffolding <yes|no>		: scaffolding of contigs used paired end information (default: on)
	-max_branch_length <integer>	: maximum length in base pair of bubble (default: 100)
	-max_divergence <floating-point>: maximum divergence rate between two branches in a bubble (default: 0.2)
	-max_gap_count <integer>	: maximum number of gaps allowed in the alignment of the two branches of a bubble (default: 3)
	-min_pair_count <integer>	: minimum number of paired end connections to justify the scaffolding of two long contigs (default: 5)
	-max_coverage <floating point>	: removal of high coverage nodes AFTER tour bus (default: no removal)
	-coverage_mask <int>	: minimum coverage required for confident regions of contigs (default: 1)
	-long_mult_cutoff <int>		: minimum number of long reads required to merge contigs (default: 2)
	-unused_reads <yes|no>		: export unused reads in UnusedReads.fa file (default: no)
	-alignments <yes|no>		: export a summary of contig alignment to the reference sequences (default: no)
	-exportFiltered <yes|no>	: export the long nodes which were eliminated by the coverage filters (default: no)
	-clean <yes|no>			: remove all the intermediary files which are useless for recalculation (default : no)
	-very_clean <yes|no>		: remove all the intermediary files (no recalculation possible) (default: no)
	-paired_exp_fraction <double>	: remove all the paired end connections which less than the specified fraction of the expected count (default: 0.1)
	-shortMatePaired* <yes|no>	: for mate-pair libraries, indicate that the library might be contaminated with paired-end reads (default no)
	-conserveLong <yes|no>		: preserve sequences with long reads in them (default no)

Output:
	directory/contigs.fa		: fasta file of contigs longer than twice hash length
	directory/stats.txt		: stats file (tab-spaced) useful for determining appropriate coverage cutoff
	directory/LastGraph		: special formatted file with all the information on the final graph
	directory/velvet_asm.afg	: (if requested) AMOS compatible assembly file

Related pages

External links


  • No labels