How to Manually Build Software

When the installation of software is not supported by the Spack software manager you will have to go through every step of the associated build process yourself.This page provides guidance on how to build software to generate an executable from its source code when the build process is supported by tools such as Make, CMake or configure script.

Building with Spack is faster and chances are the build will be optimised for the supercomputer hardware.

You should only attempt to build a software application manually if Spack does not support it, or if you have a good reason to do so, for example compiling with an uncommon option enabled. Check also the Compiling and Compiler Optimisation Levels pages for common options used for supercomputing applications.

Prerequisites

To begin with, retrieve the source code and save it on the supercomputer.

Remember that the build process must happen on the type of compute node that the code will execute on, in order to take advantage of all the optimisations for that particular architecture.

Context

The build of scientific software is the process of generating a functioning executable program from its source code. A build follows a standard sequence of steps:

  1. Environment configuration. The software you want to build and the build process itself almost always depend on libraries, tools and supporting files being present on the system. You must ensure that all required dependencies are available and discoverable through appropriate mechanisms such as environment variables.
  2. Compiling. This is the process of transforming the source code of the software into machine code, which is stored in object files. This task is performed by compilers such as gcc.
  3. Linking. Library dependencies and the object files produced by the previous steps are linked together to form an executable.
  4. Installation. Executables and other necessary artefacts, like shared or static libraries, are moved to the desired installation location on the /software filesystem, where they can be found for later use.

Major tools supporting the process are few and well established. Here is a list of them.

  1. GNU Make is the de facto standard build tool for software projects developed on and for Linux environments. It relies on a makefile, by default named Makefile, which contains the rules (including the sequence of commands, environment variables, options) that tell Make how to generate an executable from a source code.
  2. The configure script is often used to automate the retrieval of system and user information before the compilation and linking steps are performed. It generates a tailored Makefile starting from a general template and the collected information.
  3. CMake, not to be confused with GNU Make, is a meta-build tool that uses system-independent and compiler-independent configuration files to generate specific build scripts for a range of system-specific build tools, including GNU Make.

Group Ownership

We have observed that make and cmake commands do not assign the correct "group ownership" of generated software.

In order to assign the correct group ownership to the generated files, users are instructed to execute all make and cmake commands using the sg command ("set group") like:

sg <projectCode> -c 'make <commandAndOptions>'

OR

sg <projectCode> -c 'cmake <commandAndOptions>'


If the sg command is not used, then generated files may end up with the wrong group ownership. For example, if user "matilda" from project "pawsey9999" does not use this command, then the generated files may be listed with the incorrect group ownership like:

-rw-r--r-- 1 matilda matilda    1848 Mar 20 12:14 theirFile

instead of the correct group ownershipt which would be:

-rw-r--r-- 1 matilda pawsey9999 1848 Mar 20 12:16 theirFile

Incorrect ownership of files may affect your work in many ways. One of them is consuming the user's limited quota of maximum number of files in the sytem.

Steps

Recommended location for manual software builds

To keep your software organised, we recommend the following locations for your manual software installations:

  • For user-only installations:
    • Software: /software/projects/<project-id>/<user-name>/manual/software/
    • Modulefiles: /software/projects/<project-id>/<user-name>/manual/modules/
    • (Note that these installations are inside individual user's directories.)
  • For project installations (accessible to all members of the project):
    • Software: /software/projects/<project-id>/manual/software/
    • Modulefiles: /software/projects/<project-id>/manual/modules/
    • (Note that these installations are at the "project level" and not inside any user's own directory.)
    • (Any member of the project belongs to the same unix group and have enough privileges to access the installed software via the group ownership.)

The process of building software starts with obtaining and unpacking its source code into a directory, which from now on is referred to as $ROOTBUILD_DIR.

  1. Identify the build process. Usually, the software package provides a detailed description of the build process. Otherwise, you should look for one of the following files indicating which tools are used for the purpose.

    1. A CMakeLists.txt file in the $ROOTBUILD_DIR directory indicates a CMake project. Go to step 3.
    2. A configure or configure.sh script in the $ROOTBUILD_DIR directory suggests that you must execute a script to configure the build process. Go to step 2.
    3. A Makefile in the $ROOTBUILD_DIR directory signals that the project's build process is handled through GNU Make. Go to step 4.
  2. configure  scripts. A build process uses a configure script to collect information regarding the environment (operating system, compilers, libraries, etc.) you intend to build the software in. It is able to collect most of the needed information and decide on the best configuration automatically. However, there are few options that you must usually set; for instance, the --prefix option is used to specify the absolute path, the path starting from the root of the filesystem, to the installation directory. There may be options that are not required but desirable in a supercomputing environment, for example, options enabling vectorised instructions. Once the script has run, typically GNU Make must be executed next (step 4).

     Show configure usage examples ...

    To see the list of all options and arguments, execute the configure script with the –help option,

        $ ./configure --help

    As an example, the following line shows how to run a configure script specifying the --prefix option:

        $ ./configure --prefix=/path/to/installation/dir

    You can also set the value of the environment variables used by the configure script, like this:

        $ ./configure VAR=VALUE

     List of most common compiling a linking variables
    VariableMeaningExample
    CC C compilerCC=icc 
    CXX C++ compilerCXX=icpc 
    FCFortran compilerFC=ifort
    CFLAGS C compiler flagsCFLAGS=-O2 
    CXXFLAGS C++ compiler flags CXXFLAGS=-O2
    FCFLAGS Fortran compiler flags FCFLAGS=-O2
    CPPFLAGS C/C++ preprocessor flags CPPFLAGS=-I<include dir>
    LDDFLAGS Linker flags LDDFLAGS=-L<library dir>
    LIBSLinker libraries LIBS=-l<lib name>

    Notes and best practices. Always check the output produced by a configure script. It might contain warnings that call for a modification of the configure options or the shell environment. Some configure scripts compile a test code and execute it, to set some compilation options accordingly. This may not work when compiling in nodes with different architecture from the compute nodes (like the login nodes). For instance, HPE-Cray EX login nodes on Setonix do not have GPU cards nor some of the related libraries, so configure testings may fail. This type of problems can still occur to pure CPU codes. Therefore, we recommend to run the build process on the intended compute nodes for execution (GPU or CPU). If the code is still under development or if the compilation is fast, users can use the development partitions (debug or gpu-dev in the case of Setonix). For long compilations, users should submit the compilation jobs to the "production" partitions (work or gpu in the case of Setonix).

  3. Building using CMake. Similarly to configure scripts, CMake generates one or more environment-dependent build files (for Linux-based system they are Makefiles files, covered in step 4) from a high-level, environment-independent definition of the build process that is contained in the CMakeLists.txt file. Terminal 1 shows the typical sequence of commands you should use. Once completed, move to step 4.

    Terminal 1. Using CMake to generate build files
    $ cd $ROOTBUILD_DIR
    $ mkdir build
    $ cd build
    $ sg <projectcode> -c 'cmake ..'

    In words,

    1. Change the working directory of the terminal to $ROOTBUILD_DIR and create a directory named build (the name can vary, although the one suggested here is standard practice) within the same.

    2. Change again the terminal, this time into the newly created folder.

    3. From the build directory, execute the command cmake passing as an argument the path to the directory containing the CMakeLists.txt file. Typically the relative path .. is used.

    Like the configure script, you can specify options to CMake. The most common one is the CMAKE_INSTALL_PREFIX option that dictates where binaries will be installed (the default location being /usr/local). The syntax for specifying an option to CMake is -DOPTION=Value. In this case, the command would look like this:

        $ cmake -DCMAKE_INSTALL_PREFIX=/path/to/installation/directory ..

  4. Building using GNU Make. Conceptually, GNU Make executes commands to compile and link a program specified in the Makefile file, using a dedicated syntax that allows declaring dependencies between the building steps. To launch the build process, change the working directory of the terminal to the one containing the Makefile (that is, $ROOTBUILD_DIR) and simply execute the make command. Next, execute the make install command to install the built executable or library.

        $ sg <projectcode> -c 'make'
        $ sg <projectcode> -c 'make install'

    The install argument to make is called target. A target represents a subset of the Makefile file that accomplishes a particular task in the larger context of the build process. In terminal 4, the first make command executes the default target, which usually builds the software without installing it. The install target installs the binaries, that is, the produced executables or libraries.
    Sometimes you must change the value of some variables defined in the Makefile file. Some variable names are standard across most Makefile files. In particular, CCCXX and FC are used to define executable names for C, C++ and Fortran compilers, respectively, whereas CFLAGSCXXFLAGS and FFLAGS are used for the corresponding compiling flags.

    All the compiler modules in Pawsey HPC systems define the compiler variables CCCXX and FC, which are then ready to use by GNU Make.

Result

The software you have built is now located at the installation path. See the Next Steps section for what to do next in order to use it.

Example

This example shows how to build gromacs/2021.4 on Setonix using CMake. Although the application is available through the software stack provided by Pawsey (installed with Spack), sometimes users need a custom build with particular patches or flags.

  1. Login to Setonix, then move to your /software  folder and download the source code of Gromacs. See Software Stack for more information on the organisation of software on Setonix.
    $ cd /software/projects/<project-id>/<user-name>/manual/software
    $ wget https://gitlab.com/gromacs/gromacs/-/archive/v2021.4/gromacs-v2021.4.tar.gz
  2. Request an interactive session on a compute node, with 64 CPU cores to enable a parallel build. Alternatively, you can write a build script and submit the job to the scheduler.
    $ salloc -p work --ntasks=1 -c 64
  3. Extract the source code from the archive, then execute the build process.

    Terminal 2. Building gromacs using CMake
    $ tar -xf gromacs-v2021.4.tar.gz
    $ mkdir gromacs-v2021.4/build
    $ cd gromacs-v2021.4/build
    $ module load cray-fftw
    $ module load cray-mpich
    $ sg <projectcode> -c 'cmake -DCMAKE_INSTALL_PREFIX=$MYSOFTWARE/gromacs_manual_build -DGMX_MPI=ON ..'
    [ output ... ]
    $ sg <projectcode> -c 'make -j 64'
    [ output ... ]
    $ sg <projectcode> -c 'make install'
    [ output ... ]

Next steps

Once you have installed your software, you may need to set some environment variables so that the operating system can find the software and its dependencies. The environment variables are typically PATH, LD_LIBRARY_PATH  and LIBRARY_PATH.

You may want to create a module for your software to modify your environment easily. Check Modules for more information.

Related pages

External links