How to Manually Build Software
When the installation of software is not supported by the Spack software manager you will have to go through every step of the associated build process yourself.This page provides guidance on how to build software to generate an executable from its source code when the build process is supported by tools such as Make, CMake or configure script.
Building with Spack is faster and chances are the build will be optimised for the supercomputer hardware.
You should only attempt to build a software application manually if Spack does not support it, or if you have a good reason to do so, for example compiling with an uncommon option enabled. Check also the Compiling and Compiler Optimisation Levels pages for common options used for supercomputing applications.
Prerequisites
To begin with, retrieve the source code and save it on the supercomputer.
Remember that the build process must happen on the type of compute node that the code will execute on, in order to take advantage of all the optimisations for that particular architecture.
Context
The build of scientific software is the process of generating a functioning executable program from its source code. A build follows a standard sequence of steps:
- Environment configuration. The software you want to build and the build process itself almost always depend on libraries, tools and supporting files being present on the system. You must ensure that all required dependencies are available and discoverable through appropriate mechanisms such as environment variables.
- Compiling. This is the process of transforming the source code of the software into machine code, which is stored in object files. This task is performed by compilers such as
gcc. - Linking. Library dependencies and the object files produced by the previous steps are linked together to form an executable.
- Installation. Executables and other necessary artefacts, like shared or static libraries, are moved to the desired installation location on the
/softwarefilesystem, where they can be found for later use.
Major tools supporting the process are few and well established. Here is a list of them.
- GNU Make is the de facto standard build tool for software projects developed on and for Linux environments. It relies on a makefile, by default named
Makefile, which contains the rules (including the sequence of commands, environment variables, options) that tell Make how to generate an executable from a source code. - The
configurescript is often used to automate the retrieval of system and user information before the compilation and linking steps are performed. It generates a tailoredMakefilestarting from a general template and the collected information. - CMake, not to be confused with GNU Make, is a meta-build tool that uses system-independent and compiler-independent configuration files to generate specific build scripts for a range of system-specific build tools, including GNU Make.
Group Ownership
We have observed that make and cmake commands do not assign the correct "group ownership" of generated software.
In order to assign the correct group ownership to the generated files, users are instructed to execute all make and cmake commands using the sg command ("set group") like:
sg <projectCode> -c 'make <commandAndOptions>'
OR
sg <projectCode> -c 'cmake <commandAndOptions>'
If the sg command is not used, then generated files may end up with the wrong group ownership. For example, if user "matilda" from project "pawsey9999" does not use this command, then the generated files may be listed with the incorrect group ownership like:
-rw-r--r-- 1 matilda matilda 1848 Mar 20 12:14 theirFile
instead of the correct group ownershipt which would be:
-rw-r--r-- 1 matilda pawsey9999 1848 Mar 20 12:16 theirFile
Incorrect ownership of files may affect your work in many ways. One of them is consuming the user's limited quota of maximum number of files in the sytem.
Steps
Recommended location for manual software builds
To keep your software organised, we recommend the following locations for your manual software installations:
- For user-only installations:
- Software:
/software/projects/<project-id>/<user-name>/manual/software/ - Modulefiles:
/software/projects/<project-id>/<user-name>/manual/modules/ - (Note that these installations are inside individual user's directories.)
- Software:
- For project installations (accessible to all members of the project):
- Software:
/software/projects/<project-id>/manual/software/ - Modulefiles:
/software/projects/<project-id>/manual/modules/ - (Note that these installations are at the "project level" and not inside any user's own directory.)
- (Any member of the project belongs to the same unix group and have enough privileges to access the installed software via the group ownership.)
- Software:
The process of building software starts with obtaining and unpacking its source code into a directory, which from now on is referred to as $ROOTBUILD_DIR.
Identify the build process. Usually, the software package provides a detailed description of the build process. Otherwise, you should look for one of the following files indicating which tools are used for the purpose.
- A
CMakeLists.txtfile in the$ROOTBUILD_DIRdirectory indicates a CMake project. Go to step 3. - A
configureorconfigure.shscript in the$ROOTBUILD_DIRdirectory suggests that you must execute a script to configure the build process. Go to step 2. - A
Makefilein the$ROOTBUILD_DIRdirectory signals that the project's build process is handled through GNU Make. Go to step 4.
- A
configurescripts. A build process uses aconfigurescript to collect information regarding the environment (operating system, compilers, libraries, etc.) you intend to build the software in. It is able to collect most of the needed information and decide on the best configuration automatically. However, there are few options that you must usually set; for instance, the--prefixoption is used to specify the absolute path, the path starting from the root of the filesystem, to the installation directory. There may be options that are not required but desirable in a supercomputing environment, for example, options enabling vectorised instructions. Once the script has run, typically GNU Make must be executed next (step 4).Notes and best practices. Always check the output produced by a
configurescript. It might contain warnings that call for a modification of the configure options or the shell environment. Someconfigurescripts compile a test code and execute it, to set some compilation options accordingly. This may not work when compiling in nodes with different architecture from the compute nodes (like the login nodes). For instance, HPE-Cray EX login nodes on Setonix do not have GPU cards nor some of the related libraries, so configure testings may fail. This type of problems can still occur to pure CPU codes. Therefore, we recommend to run the build process on the intended compute nodes for execution (GPU or CPU). If the code is still under development or if the compilation is fast, users can use the development partitions(debugorgpu-devin the case of Setonix). For long compilations, users should submit the compilation jobs to the "production" partitions (workorgpuin the case of Setonix).Building using CMake. Similarly to
configurescripts, CMake generates one or more environment-dependent build files (for Linux-based system they areMakefilesfiles, covered in step 4) from a high-level, environment-independent definition of the build process that is contained in theCMakeLists.txtfile. Terminal 1 shows the typical sequence of commands you should use. Once completed, move to step 4.Terminal 1. Using CMake to generate build files$ cd $ROOTBUILD_DIR $ mkdir build $ cd build $ sg <projectcode> -c 'cmake ..'
In words,
Change the working directory of the terminal to
$ROOTBUILD_DIRand create a directory namedbuild(the name can vary, although the one suggested here is standard practice) within the same.Change again the terminal, this time into the newly created folder.
From the
builddirectory, execute the commandcmakepassing as an argument the path to the directory containing theCMakeLists.txtfile. Typically the relative path..is used.
Like the
configurescript, you can specify options to CMake. The most common one is theCMAKE_INSTALL_PREFIXoption that dictates where binaries will be installed (the default location being/usr/local). The syntax for specifying an option to CMake is-DOPTION=Value. In this case, the command would look like this:$ cmake -DCMAKE_INSTALL_PREFIX=/path/to/installation/directory ..Building using GNU Make. Conceptually, GNU Make executes commands to compile and link a program specified in the
Makefilefile, using a dedicated syntax that allows declaring dependencies between the building steps. To launch the build process, change the working directory of the terminal to the one containing theMakefile(that is,$ROOTBUILD_DIR) and simply execute themakecommand. Next, execute themake installcommand to install the built executable or library.$ sg <projectcode> -c 'make'$ sg <projectcode> -c 'make install'The install argument to
makeis called target. A target represents a subset of theMakefile file that accomplishes a particular task in the larger context of the build process. In terminal 4, the firstmakecommand executes the default target, which usually builds the software without installing it. The install target installs the binaries, that is, the produced executables or libraries.
Sometimes you must change the value of some variables defined in theMakefilefile. Some variable names are standard across mostMakefilefiles. In particular,CC,CXXandFCare used to define executable names for C, C++ and Fortran compilers, respectively, whereasCFLAGS,CXXFLAGSandFFLAGSare used for the corresponding compiling flags.All the compiler modules in Pawsey HPC systems define the compiler variables
CC,CXXandFC, which are then ready to use by GNU Make.
Result
The software you have built is now located at the installation path. See the Next Steps section for what to do next in order to use it.
Example
This example shows how to build gromacs/2021.4 on Setonix using CMake. Although the application is available through the software stack provided by Pawsey (installed with Spack), sometimes users need a custom build with particular patches or flags.
- Login to Setonix, then move to your
/softwarefolder and download the source code of Gromacs. See Software Stack for more information on the organisation of software on Setonix.$ cd /software/projects/<project-id>/<user-name>/manual/software$ wget https://gitlab.com/gromacs/gromacs/-/archive/v2021.4/gromacs-v2021.4.tar.gz - Request an interactive session on a compute node, with 64 CPU cores to enable a parallel build. Alternatively, you can write a build script and submit the job to the scheduler.
$ salloc -p work --ntasks=1 -c 64 Extract the source code from the archive, then execute the build process.
Terminal 2. Building gromacs using CMake$ tar -xf gromacs-v2021.4.tar.gz $ mkdir gromacs-v2021.4/build $ cd gromacs-v2021.4/build $ module load cray-fftw $ module load cray-mpich $ sg <projectcode> -c 'cmake -DCMAKE_INSTALL_PREFIX=$MYSOFTWARE/gromacs_manual_build -DGMX_MPI=ON ..' [ output ... ] $ sg <projectcode> -c 'make -j 64' [ output ... ] $ sg <projectcode> -c 'make install' [ output ... ]
Next steps
Once you have installed your software, you may need to set some environment variables so that the operating system can find the software and its dependencies. The environment variables are typically PATH, LD_LIBRARY_PATH and LIBRARY_PATH.
You may want to create a module for your software to modify your environment easily. Check Modules for more information.
Related pages
External links
- Software Carpentry's introductory GNU Make tutorial
- Official CMake tutorial