SHPC (Singularity Registry HPC)

SHPC is a utility that allows the installation of software containers in the form of container modules.

On this page:

Prerequisites

Familiarity with:

What is SHPC?

SHPC allows the installation of software containers in the form of so-called container modules, for transparent usage of containerised applications. An automated process generates a system module for an application, hiding the specificities of the Singularity syntax behind shell functions that take the same name as the corresponding executables.

For containerised applications that are already available in the SHPC registry, installing and using them via SHPC is much simpler than using Singularity itself. For applications that are not yet in the registry, writing a custom container recipe may still be faster than learning how to use Singularity.

SHPC at Pawsey

SHPC has been configured by Pawsey staff to work out-of-the-box; the following aspects have been set up:

  • Directory trees for installed containers and modules
  • Default registry for installation recipes
  • Use of Singularity as the container runtime
  • Naming convention and features of generated modulefiles
  • Configuration for MPI and GPU containers

SHPC is used by Pawsey staff to deploy some of the available scientific softwares, in particular bioinformatics applications.

Using SHPC

Installing a container for a software included in the SHPC registry requires no knowledge of containers and Singularity: all you need are the software name and version.

The key commands of SHPC are show and install; let's see them in action with an example. Suppose we want to install the bioinformatics package BWA. We can use the shpc show command to browse the SHPC registry of available containers:

Terminal 1. Example SHPC Show command
$ module load shpc/0.0.53  # load SHPC module

$ shpc show -f bwa  # search for a package in SHPC registry (string search)
biocontainers/bwa
ghcr.io/autamus/bwa

$ shpc show biocontainers/bwa  # inspect specific container recipe
docker: biocontainers/bwa
url: https://hub.docker.com/r/biocontainers/bwa
maintainer: '@vsoch'
description: BWA is a software package for mapping low-divergent sequences against
  a large reference genome, such as the human genome.
latest:
  0.7.15: sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
tags:
  0.7.15: sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
  v0.7.17_cv1: sha256:9479b73e108ded3c12cb88bb4e918a5bf720d7861d6d8cdbb46d78a972b6ff1b
aliases:
  bwa: /opt/conda/bin/bwa

The information of interest in this output is the list of available versions (or tags), in this case: 0.7.15 and v0.7.17_cv1. Let's install the former:

Terminal 2. Example SHPC Install command
$ shpc install biocontainers/bwa:0.7.15
singularity pull --name /software/projects/projectcode/rsrchr/setonix/containers/sif/biocontainers/bwa/0.7.15/biocontainers-bwa-0.7.15-sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e.sif docker://biocontainers/bwa@sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures

[..]

INFO:    Creating SIF file...
/software/projects/projects/projectcode/rsrchr/setonix/containers/sif/biocontainers/bwa/0.7.15/biocontainers-bwa-0.7.15-sha256:6f76c11a816b10440fd9d2c64c7183a31cc104a729f31a373c9b2b068138305e.sif
Module biocontainers/bwa:0.7.15 was created.

That's it!

By default SHPC downloads containers under:

/software/projects/<project-id>/<user-name>/setonix/containers/sif/

and creates modulefiles under:

/software/projects/<project-id>/<user-name>/setonix/containers/modules/

You are able to use module availmodule load, and module unload: (as these are system modules, note the slash "/" for the version, instead of the colon ":" above for the tags):

Terminal 3. Example SHPC module load
$ module avail bwa  # search module

-------------------------------------------------------- /software/projects/projectcode/rsrchr/setonix/containers/modules ---------------------------------------------------------
   biocontainers/bwa/0.7.15/module

$ module load biocontainers/bwa/0.7.15/module  # load module

$ bwa  # test command

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.15-r1140
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'.
      There are three alignment algorithms in BWA: `mem', `bwasw', and
      `aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
      first. Please `man ./bwa.1' for the manual.

Loading a module created by SHPC

As of version 0.0.53 of SHPC, modules created using this tool require the suffix /module  to be loaded correctly.

For instance:

module load biocontainers/bwa/0.7.15/module 

Failing to add the /module will result in an error, and no module will be loaded.

This behaviour is going to be improved in future SHPC versions.


The full list of SHPC commands can be shown by using one of the help commands:

$ shpc -h
$ shpc <subcommand> -h

Writing an SHPC container recipe

What if a software container is not in the SHPC registry? In this case, you can either write your own container recipe (see terminal 5), or email the Pawsey helpdesk for help.

Suppose you want to install the bioinformatics tool Velvet. For the sake of this example, we know already that there's a container available for Velvet at quay.io/biocontainers/velvet (external site).

Terminal 4. Velvet not on SHPC registry
$ module load shpc/0.0.53  # load SHPC module

$ shpc show -f velvet
$

As you can see from the empty output, there's no pre-existing entry in the SHPC Container Registry.

Let's see how to create one; in practice, we need to create a YAML container recipe inside the registry tree of SHPC. First, let's get the location of the registry, and then create an appropriate directory structure using the known container repo quay.io/biocontainers/velvet that was postulated above.

Terminal 5. Create SHPC container recipe for Velvet
$ shpc config get registry  # get registry location
registry                       /software/projects/projectcode/rsrchr/shpc/registry

# create directory tree for desired Velvet container recipe
$ mkdir -p /software/projects/projectcode/rsrchr/shpc/registry/quay.io/biocontainers/velvet

# create a new YAML container recipe in the new path (using vi as text editor here)
$ vi /software/projects/projectcode/rsrchr/shpc/registry/quay.io/biocontainers/velvet/container.yaml

Let's see how a possible recipe for Velvet might look:

Listing 1. Velvet container recipe YAML
docker: quay.io/biocontainers/velvet

latest:
  "1.2.10--h5bf99c6_4": "sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27"
tags:
  "1.2.10--h5bf99c6_4": "sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27"
  "1.2.10--hed695b0_3": "sha256:b17fd98d802c1e78dde5fd2c5431efc1969db35a279f3a5ca7afcb46efc66e4a"

maintainer: "@marcodelapierre"

# these are optional
description: "Velvet is a sequence assembler for short reads."
url: https://quay.io/repository/biocontainers/velvet

aliases:
  velvetg: /usr/local/bin/velvetg
  velveth: /usr/local/bin/velveth

Let's comment on the key components of this YAML file:

  • docker is the repository path for the container, without version tags
  • tags is a list of container tags (versions) with the corresponding SHA message digest (shasum); these need to be manually collected from the repository website, in this case https://quay.io/repository/biocontainers/velvet?tab=tags 
  • latest is a copy-paste of the tag from above, to be used as "latest" version
  • maintainer is the Github username of the creator (required to contribute the recipe back to the Github repository of SHPC; put any name if you don't have one)
  • aliases is a list of command names that will be made available by the SHPC module, with the corresponding commands from inside the container; these need to be manually provided, either by reading through the documentation of the package, or by downloading and inspecting the container

Does this recipe work? Let's give it a go!

Terminal 6. Test new SHPC container recipe for Velvet
$ shpc show -f velvet  # can SHPC locate the new recipe? yes!
quay.io/biocontainers/velvet

$ shpc install quay.io/biocontainers/velvet:1.2.10--h5bf99c6_4  # installing Velvet
singularity pull --name /software/projectcode/rsrchr/shpc/containers/quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4/quay.io-biocontainers-velvet-1.2.10--h5bf99c6_4-sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27.sif docker://quay.io/biocontainers/velvet@sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures

[..]

INFO:    Creating SIF file...
/software/projects/projectcode/rsrchr/shpc/containers/quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4/quay.io-biocontainers-velvet-1.2.10--h5bf99c6_4-sha256:7fc2606a1431883dcd0acf830abcfeddb975677733d110a085da0f07782f5a27.sif
Module quay.io/biocontainers/velvet:1.2.10--h5bf99c6_4 was created.

$ module load quay.io/biocontainers/velvet/1.2.10--h5bf99c6_4/module  # loading module

$ velvetg --help  # testing a command
Usage:
./velvetg directory [options]

	directory			: working directory name

Standard options:
	-cov_cutoff <floating-point|auto>	: removal of low coverage nodes AFTER tour bus or allow the system to infer it
		(default: no removal)
	-ins_length <integer>		: expected distance between two paired end reads (default: no read pairing)
	-read_trkg <yes|no>		: tracking of short read positions in assembly (default: no tracking)
	-min_contig_lgth <integer>	: minimum contig length exported to contigs.fa file (default: hash length * 2)
	-amos_file <yes|no>		: export assembly to AMOS file (default: no export)
	-exp_cov <floating point|auto>	: expected coverage of unique regions or allow the system to infer it
		(default: no long or paired-end read resolution)
	-long_cov_cutoff <floating-point>: removal of nodes with low long-read coverage AFTER tour bus
		(default: no removal)

Advanced options:
	-ins_length* <integer>		: expected distance between two paired-end reads in the respective short-read dataset (default: no read pairing)
	-ins_length_long <integer>	: expected distance between two long paired-end reads (default: no read pairing)
	-ins_length*_sd <integer>	: est. standard deviation of respective dataset (default: 10% of corresponding length)
		[replace '*' by nothing, '2' or '_long' as necessary]
	-scaffolding <yes|no>		: scaffolding of contigs used paired end information (default: on)
	-max_branch_length <integer>	: maximum length in base pair of bubble (default: 100)
	-max_divergence <floating-point>: maximum divergence rate between two branches in a bubble (default: 0.2)
	-max_gap_count <integer>	: maximum number of gaps allowed in the alignment of the two branches of a bubble (default: 3)
	-min_pair_count <integer>	: minimum number of paired end connections to justify the scaffolding of two long contigs (default: 5)
	-max_coverage <floating point>	: removal of high coverage nodes AFTER tour bus (default: no removal)
	-coverage_mask <int>	: minimum coverage required for confident regions of contigs (default: 1)
	-long_mult_cutoff <int>		: minimum number of long reads required to merge contigs (default: 2)
	-unused_reads <yes|no>		: export unused reads in UnusedReads.fa file (default: no)
	-alignments <yes|no>		: export a summary of contig alignment to the reference sequences (default: no)
	-exportFiltered <yes|no>	: export the long nodes which were eliminated by the coverage filters (default: no)
	-clean <yes|no>			: remove all the intermediary files which are useless for recalculation (default : no)
	-very_clean <yes|no>		: remove all the intermediary files (no recalculation possible) (default: no)
	-paired_exp_fraction <double>	: remove all the paired end connections which less than the specified fraction of the expected count (default: 0.1)
	-shortMatePaired* <yes|no>	: for mate-pair libraries, indicate that the library might be contaminated with paired-end reads (default no)
	-conserveLong <yes|no>		: preserve sequences with long reads in them (default no)

Output:
	directory/contigs.fa		: fasta file of contigs longer than twice hash length
	directory/stats.txt		: stats file (tab-spaced) useful for determining appropriate coverage cutoff
	directory/LastGraph		: special formatted file with all the information on the final graph
	directory/velvet_asm.afg	: (if requested) AMOS compatible assembly file

Advanced: adding features to your recipe

You can add advanced features to your recipe using Features. The SHPC authors outline this functionality in their documentation.

Currently*, the following features are supported:

NameDescriptionDefaultOptions
gpuIf the container technology supports it, add flags to indicate using gpu.nullnvidia, amd, null
x11Bind mount ~/.Xauthority or a custom pathnulltrue (uses default path ~/.Xauthority), false/null (do not enable) or a custom path to an x11 file
homeSpecify and bind mount a custom homepathnullcustom path for the home, or false/null

*Correct as of 11/8/2023

Note that you can use the home feature to allow your container module to bind mount a home of your choosing. This might be needed for some containers that expect to access a certain directory. For example, the aws-cli container expects to have read/write access to your ~/.aws  directory. See the following SHPC recipe as an example which would bind mount the user's home directory:


Listing 2. Example SHPC yaml demonstrating the home feature
docker: quay.io/pawsey/hpc-python
url: https://quay.io/repository/pawsey/hpc-python
maintainer: '@marcodelapierre'
description: Base Python images with popular packages for HPC workflows.
latest:
  '2022.03': sha256:962e7c24302b2dc3946bb22326d0cb4385373113a212231488070aa3e43bd1a1
tags:
  '2021.09': sha256:c2f3f585a0be711046583c5861199107c94e047545325834d68d81d2582b7a04
  2021.09-hdf5mpi: sha256:9d34b5908630e028a6a084891af8b6e65f2626c30e57c06e883f8909850c782b
  '2022.03': sha256:962e7c24302b2dc3946bb22326d0cb4385373113a212231488070aa3e43bd1a1
  2022.03-hdf5mpi: sha256:e9a0db88e98c2388d8731a983ed845b46ce0e2d99d4566802b84142ce21e1c23
aliases:
  python: /usr/local/bin/python
  python3: /usr/local/bin/python3
env:
  PYTHONSTARTUP: ''
  PYTHONUSERBASE: ''
features:
  home: true

Related pages

External links