How to Manually Build Software
When the installation of software is not supported by the Spack software manager you will have to go through every step of the associated build process yourself.This page provides guidance on how to build software to generate an executable from its source code when the build process is supported by tools such as Make, CMake or configure
script.
Building with Spack is faster and chances are the build will be optimised for the supercomputer hardware.
You should only attempt to build a software application manually if Spack does not support it, or if you have a good reason to do so, for example compiling with an uncommon option enabled. Check also the Compiling and Compiler Optimisation Levels pages for common options used for supercomputing applications.
Prerequisites
To begin with, retrieve the source code and save it on the supercomputer.
Remember that the build process must happen on the type of compute node that the code will execute on, in order to take advantage of all the optimisations for that particular architecture.
Context
The build of scientific software is the process of generating a functioning executable program from its source code. A build follows a standard sequence of steps:
- Environment configuration. The software you want to build and the build process itself almost always depend on libraries, tools and supporting files being present on the system. You must ensure that all required dependencies are available and discoverable through appropriate mechanisms such as environment variables.
- Compiling. This is the process of transforming the source code of the software into machine code, which is stored in object files. This task is performed by compilers such as
gcc
. - Linking. Library dependencies and the object files produced by the previous steps are linked together to form an executable.
- Installation. Executables and other necessary artefacts, like shared or static libraries, are moved to the desired installation location on the
/software
filesystem, where they can be found for later use.
Major tools supporting the process are few and well established. Here is a list of them.
- GNU Make is the de facto standard build tool for software projects developed on and for Linux environments. It relies on a makefile, by default named
Makefile
, which contains the rules (including the sequence of commands, environment variables, options) that tell Make how to generate an executable from a source code. - The
configure
script is often used to automate the retrieval of system and user information before the compilation and linking steps are performed. It generates a tailoredMakefile
starting from a general template and the collected information. - CMake, not to be confused with GNU Make, is a meta-build tool that uses system-independent and compiler-independent configuration files to generate specific build scripts for a range of system-specific build tools, including GNU Make.
Group Ownership
We have observed that make
and cmake
commands do not assign the correct "group ownership" of generated software.
In order to assign the correct group ownership to the generated files, users are instructed to execute all make
and cmake
commands using the sg
command ("set group") like:
sg <projectCode> -c 'make <commandAndOptions>'
OR
sg <projectCode> -c 'cmake <commandAndOptions>'
If the sg
command is not used, then generated files may end up with the wrong group ownership. For example, if user "matilda" from project "pawsey9999" does not use this command, then the generated files may be listed with the incorrect group ownership like:
-rw-r--r-- 1 matilda matilda 1848 Mar 20 12:14 theirFile
instead of the correct group ownershipt which would be:
-rw-r--r-- 1 matilda pawsey9999 1848 Mar 20 12:16 theirFile
Incorrect ownership of files may affect your work in many ways. One of them is consuming the user's limited quota of maximum number of files in the sytem.
Steps
Recommended location for manual software builds
To keep your software organised, we recommend the following locations for your manual software installations:
- For user-only installations:
- Software:
/software/projects/<project-id>/<user-name>/manual/software/
- Modulefiles:
/software/projects/<project-id>/<user-name>/manual/modules/
- (Note that these installations are inside individual user's directories.)
- Software:
- For project installations (accessible to all members of the project):
- Software:
/software/projects/<project-id>/manual/software/
- Modulefiles:
/software/projects/<project-id>/manual/modules/
- (Note that these installations are at the "project level" and not inside any user's own directory.)
- (Any member of the project belongs to the same unix group and have enough privileges to access the installed software via the group ownership.)
- Software:
The process of building software starts with obtaining and unpacking its source code into a directory, which from now on is referred to as $ROOTBUILD_DIR
.
Identify the build process. Usually, the software package provides a detailed description of the build process. Otherwise, you should look for one of the following files indicating which tools are used for the purpose.
- A
CMakeLists.txt
file in the$ROOTBUILD_DIR
directory indicates a CMake project. Go to step 3. - A
configure
orconfigure.sh
script in the$ROOTBUILD_DIR
directory suggests that you must execute a script to configure the build process. Go to step 2. - A
Makefile
in the$ROOTBUILD_DIR
directory signals that the project's build process is handled through GNU Make. Go to step 4.
- A
configure
scripts. A build process uses aconfigure
script to collect information regarding the environment (operating system, compilers, libraries, etc.) you intend to build the software in. It is able to collect most of the needed information and decide on the best configuration automatically. However, there are few options that you must usually set; for instance, the--prefix
option is used to specify the absolute path, the path starting from the root of the filesystem, to the installation directory. There may be options that are not required but desirable in a supercomputing environment, for example, options enabling vectorised instructions. Once the script has run, typically GNU Make must be executed next (step 4).Notes and best practices. Always check the output produced by a
configure
script. It might contain warnings that call for a modification of the configure options or the shell environment. Someconfigure
scripts compile a test code and execute it, to set some compilation options accordingly. This may not work when compiling in nodes with different architecture from the compute nodes (like the login nodes). For instance, HPE-Cray EX login nodes on Setonix do not have GPU cards nor some of the related libraries, so configure testings may fail. This type of problems can still occur to pure CPU codes. Therefore, we recommend to run the build process on the intended compute nodes for execution (GPU or CPU). If the code is still under development or if the compilation is fast, users can use the development partitions(debug
orgpu-dev
in the case of Setonix). For long compilations, users should submit the compilation jobs to the "production" partitions (work
orgpu
in the case of Setonix).Building using CMake. Similarly to
configure
scripts, CMake generates one or more environment-dependent build files (for Linux-based system they areMakefiles
files, covered in step 4) from a high-level, environment-independent definition of the build process that is contained in theCMakeLists.txt
file. Terminal 1 shows the typical sequence of commands you should use. Once completed, move to step 4.Terminal 1. Using CMake to generate build files$ cd $ROOTBUILD_DIR $ mkdir build $ cd build $ sg <projectcode> -c 'cmake ..'
In words,
Change the working directory of the terminal to
$ROOTBUILD_DIR
and create a directory namedbuild
(the name can vary, although the one suggested here is standard practice) within the same.Change again the terminal, this time into the newly created folder.
From the
build
directory, execute the commandcmake
passing as an argument the path to the directory containing theCMakeLists.txt
file. Typically the relative path..
is used.
Like the
configure
script, you can specify options to CMake. The most common one is theCMAKE_INSTALL_PREFIX
option that dictates where binaries will be installed (the default location being/usr/local
). The syntax for specifying an option to CMake is-DOPTION=Value
. In this case, the command would look like this:$ cmake -DCMAKE_INSTALL_PREFIX=/path/to/installation/directory ..
Building using GNU Make. Conceptually, GNU Make executes commands to compile and link a program specified in the
Makefile
file, using a dedicated syntax that allows declaring dependencies between the building steps. To launch the build process, change the working directory of the terminal to the one containing theMakefile
(that is,$ROOTBUILD_DIR
) and simply execute themake
command. Next, execute themake install
command to install the built executable or library.$ sg <projectcode> -c 'make'
$ sg <projectcode> -c 'make install'
The install argument to
make
is called target. A target represents a subset of theMakefil
e file that accomplishes a particular task in the larger context of the build process. In terminal 4, the firstmake
command executes the default target, which usually builds the software without installing it. The install target installs the binaries, that is, the produced executables or libraries.
Sometimes you must change the value of some variables defined in theMakefile
file. Some variable names are standard across mostMakefile
files. In particular,CC
,CXX
andFC
are used to define executable names for C, C++ and Fortran compilers, respectively, whereasCFLAGS
,CXXFLAGS
andFFLAGS
are used for the corresponding compiling flags.All the compiler modules in Pawsey HPC systems define the compiler variables
CC
,CXX
andFC
, which are then ready to use by GNU Make.
Result
The software you have built is now located at the installation path. See the Next Steps section for what to do next in order to use it.
Example
This example shows how to build gromacs/2021.4
on Setonix using CMake. Although the application is available through the software stack provided by Pawsey (installed with Spack), sometimes users need a custom build with particular patches or flags.
- Login to Setonix, then move to your
/software
folder and download the source code of Gromacs. See Software Stack for more information on the organisation of software on Setonix.$ cd /software/projects/<project-id>/<user-name>/manual/software
$ wget https://gitlab.com/gromacs/gromacs/-/archive/v2021.4/gromacs-v2021.4.tar.gz
- Request an interactive session on a compute node, with 64 CPU cores to enable a parallel build. Alternatively, you can write a build script and submit the job to the scheduler.
$ salloc -p work --ntasks=1 -c 64
Extract the source code from the archive, then execute the build process.
Terminal 2. Building gromacs using CMake$ tar -xf gromacs-v2021.4.tar.gz $ mkdir gromacs-v2021.4/build $ cd gromacs-v2021.4/build $ module load cray-fftw $ module load cray-mpich $ sg <projectcode> -c 'cmake -DCMAKE_INSTALL_PREFIX=$MYSOFTWARE/gromacs_manual_build -DGMX_MPI=ON ..' [ output ... ] $ sg <projectcode> -c 'make -j 64' [ output ... ] $ sg <projectcode> -c 'make install' [ output ... ]
Next steps
Once you have installed your software, you may need to set some environment variables so that the operating system can find the software and its dependencies. The environment variables are typically PATH
, LD_LIBRARY_PATH
and LIBRARY_PATH
.
You may want to create a module for your software to modify your environment easily. Check Modules for more information.
Related pages
External links
- Software Carpentry's introductory GNU Make tutorial
- Official CMake tutorial