How to Install Software
This page describes how to install software that you require which is not already available. The procedure for installing software on a supercomputer can be very different from how you would install the same software on a workstation.
Before you begin
Make sure the software you are searching for is not already installed. Pawsey installs and maintains a predefined list of software packages on its supercomputing infrastructure.
To search for a specific software package, use this command(s):
$ module avail <string> # shows all modules that can be loaded currently that contain this string
$ module -r spider <string> # shows all modules in the current module paths that contain this string
To search for keywords also inside the module description, use this command:
$ module key <keyword>
You can send an email to the help desk to ask for the software to be considered for official support. If the software will be supported by Pawsey, it will be installed system-wide.
For further information on the organisation of the software stack at Pawsey and levels of support for software installation, see Software Stack Policies.
Using the correct Linux group on /software
Some programs like make
and cmake
may generate files associated with the personal group of the user, instead of the project one. For this reason, any software build program generating files on the /software
filesystem must be executed using the sg
Linux utility to make sure the process is run with the correct group ID. All files under /software
must belong to the Linux group of the project they are created by, to contribute towards the correct quota limit. If you fail to do so, these files are accounted for in your personal quote that is meant to limit /home
usage and is much more limited. For instance, to install the namd
package using Spack Pawsey staff may run something like
$ sg pawsey0001 -c 'spack install namd'.
If you want to run a CMake build, you would type a command similar to
$ sg $PAWSEY_PROJECT -c 'cmake ..; make; make install'
Decide on an installation method
The recommended installation method depends on the package type.
For bioinformatics packages:
- Our recommendation is to use containers for installation.
- Conda is another popular way of installing bioinformatics software so as to achieve portable and reproducible installations.
For Python packages:
- Our recommendation is to use the pip or setuptools utilities, as directed by the package documentation. Pip is the simplest way to install additional Python packages at Pawsey, while leveraging the pre-installed performance libraries.
- For complex, production-ready Python workflows, where reproducibility and portability are more important, we advise using containers. Design and development of the workflow should rely on other installation methods for efficiency, until a finalised container recipe can be defined, ready for image building.
- Conda is an alternative to containers for achieving portable and reproducible installations.
For R packages:
- Our recommendation is to use the built-in mechanisms at Pawsey for installing R packages.
- For complex, production-ready R workflows, or where the installation requires administrative privileges, we advise using containers. Design and development of the workflow should rely on other installation methods for efficiency, until a finalised container recipe can be defined, ready for image building;
- Conda is an alternative to containers for achieving portable and reproducible installations.
For other package types, use the Spack package manager.
If the software is not supported by any of the above methods, you will need to manually build the software yourself, following the build instructions provided by the developers. Once the build is complete, you may optionally want to create a modulefile for the software. For more information, check the Modules page.
Related pages
The following pages discuss how to install and manage software.
- Conda and Reproducible Installations — Conda is a popular package manager to perform binary installations of a large variety of packages. Although these do not provide optimal performance on HPC, they can represent an acceptable solution when workflow reproducibility and portability are crucial (similar to containers). A faster version of Conda is Mamba, which we recommend you use in combination with Conda, or instead of Conda. Mamba and Conda can work together, or you can use Mamba as a drop-in replacement for Conda. If you know how
- How to Configure Conda to Avoid Quota Issues — Conda is a popular package manager. However, Conda creates many small files when installing packages, which can quickly fill your quota on /software. This article will tell you how to configure Conda to avoid this issue. Please note that these instructions also work for Mamba, which is a much faster drop-in replacement for Conda.
- How to Manually Build Software — When the installation of software is not supported by the Spack software manager you will have to go through every step of the associated build process yourself.
- Installing Python Packages — Pip and Setuptools (setup.py) are the most popular tools for installing Python packages, and also the easiest ways to benefit from the Python performance libraries that come preinstalled on Pawsey systems.
- Installing R Packages — R provides built-in mechanisms to install additional packages.
- Spack — Pawsey provides and maintains a number of prebuilt libraries and applications on Setonix, most of which are installed and managed through Spack. This page outlines how users can use Spack to install additional software or different builds of existing software that are not provided by Pawsey supported modules.
Modules—The Module system is used to provide users with access to software on HPC systems. This page explains the basics concepts of modules and how to use them.
Containers—Containers allow users to package an application and all of the software dependencies needed to run it into a single, sandboxed piece of software. Containers enable an application to run quickly and reliably from one computing environment to another.