Compiling
Compilation is the process of transforming a source code written in a high-level programming language into a sequence of low-level instructions a computer can understand and execute. A compiler is a program that performs such transformation. Pawsey offers several compilers and this page illustrates and gives recommendations on how to compile your programs.
On this Page |
|---|
|
Prerequisites
Programming languages that require compilation for a source code to be executed are called compiled languages. Popular examples are C, C++ and Fortran. There is another type of programming language called interpreted language, where the source code is read and executed as-is by another program, the interpreter. Examples of interpreted languages are Python and Matlab. The content of this page applies only to compiled languages, C/C++ and Fortran in particular.
If you need to compile third-party software, check How to Manually Build Software.
How to choose a compiler family
Sometimes it does not matter whether you use the GNU, AMD or Cray compilers, as all of them support a common set of features for supported programming languages (for instance, C). However, there are cases where you may want to use a specific compiler.
If you are working to port an existing software package to a supercomputing environment it is always a good idea to keep using the same compiler that has been used to compile the code previously, if possible. There may be language extensions used, the code may rely on the optimisation behaviour of the compiler, or there may be library dependencies that make porting easier if the same compiler is chosen. A good example of this is when porting any GNU software package.
As a general good rule, use the compiler that firstly is able to produce an executable that generates the correct results. Switching to another compiler can then be considered after this step.
If all compilers produce correct executables, the choice of compiler, and hence programming environment, will depend on the application under consideration. Experience with the provided compilers suggests the following observations.
GNU compilers
Best choice for C/C++, very good support for Fortran.
Access to a wide range of parallel programming models including MPI, OpenMP, OpenACC and Pthreads.
Cray compilers
Best choice for Fortran, and very good support for C, too.
C++ compiling is sometimes problematic owing to the strict observation of the standard by the Cray C++ compiler. You might not be able to compile some open-source packages developed with GNU compilers, for example.
Access to a wide range of parallel programming models including MPI, OpenMP and OpenACC, along with SHMEM, partitioned global address space models UPC, co-array Fortran and Chapel.
AOCC compilers
AMD-optimised compilers based on LLVM Clang compilers. Excellent choice for C/C++ as it hooks into the LLVM toolchain, which provides a number of useful tools and compilation flags.
Not recommended for Fortran.
Access to a wide range of parallel programming models including MPI, OpenMP and Pthreads.
Migration note: Intel compilers are not present on Setonix
The Intel compilers are not present in Setonix as the chip vendor for Setonix is AMD. Therefore, chip vendor native compilers are now AOCC: the AMD compilers that are based on LLVM Clang and Flang.
Nevertheless, our current recommendation for codes that were built with Intel compilers in the past should move to GNU compilers.
Ultimately, some testing may be required to find the best compiler for a given code. You should be aware that it is a good practice to use a range of different compilers in order to confirm code standard-conformance and portability.
Basics of compilation
Often the term compilation is used to refer to both the compilation of a source code and linking of the resulting object files, the low-level representation in machine code, and third-party libraries into an executable. This is because compilers allow performing both steps at once for simple programs. However, when your source code is large, this course of actions is not advisable.
The compilation process is presented using the GNU compiler for the C programming language but what is described applies also to other compilers. The examples make use of the C compiler wrapper, cc , and the PrgEnv-gnu environment. C/C++ and Fortran compilation should all use the Cray provided wrappers that add all the appropriate libraries to enable MPI. These are
Language | Compiler |
|---|---|
C |
|
C++ |
|
Fortran |
|
Step 1. Compiling to object files
For most compilers, the -c option instructs to perform only the compilation step, generating intermediate object files. Note that in C/C++ codes, prior to translating the source into machine language, the compiler executes the preprocessor, which modifies the source code according to special instructions called macros (typically the lines of code starting with a hash, #).
Terminal 1 shows how to compile a simple C source code file, main.c. The -o option is used to specify the name of the output, in this case the object file, or the executable when -c is not used.
Terminal 1. Compiling a source code into an object file
$ # compiling code with cc, the C compiler wrapper. $ cc -c -o main.o main.c
Additional compiler options may also be added to modify the behaviour of the compiler, such as the optimisation levels and handling of OpenMP directives. Check the Common compiler options section on this page.
Step 2. Linking object files and libraries into an executable
The link phase combines all the object files and external libraries and creates an executable. The most basic method to link an object or object files into an executable is by listing the object files as arguments to the compiler. Terminal 2 shows how to generate the executable from the main.o object file created during the previous step.
Terminal 2. Generating an executable from an object file
$ cc main.o
If you don't specify the name of the output with the -o option, the default behaviour of the compiler is to generate an executable named a.out. Terminal 3 demonstrates how to specify multiple object files as input to the compiler.
Terminal 3. Linking object files to create an executable
$ cc -o main obj-1.o obj-2.o obj-3.o
Additional link options may be added to this command, such as the ones for linking external libraries.
How to compile and link using external libraries
Sometimes a code uses routines or functions that are part of an external library, software that others have developed and made available, such as a numerical library that has been carefully optimised for very specific mathematical tasks. For the program to be able to use an external library, compilation and linking steps require additional flags to know where to find it.
At compile time, you must indicate to the compiler the path containing header files of the library, using the flag -I. For instance, if the library were installed in the /user/local/mylib directory, then terminal 4 shows how to compile the main.c program specifying the path to headers files.
Terminal 4. Specifying a header search path
$ cc -I/usr/local/mylib/headers -c main.c
Another way you can tell the compiler where to search for header files is by setting and exporting the CPATH environment variable. For instance, terminal 5 shows an alternative to the command in terminal 4.
At link time, you must provide both the path to the directory containing the library file, through the -L option, and the library filename, with the -l option, as shown in terminal 6.
You can use the LIBRARY_PATH environment variable in place of the -L option, but the -l<library-name> option must still be present.
Alternatively, the library search path can be hardcoded within the executable, so that it does not have to be provided at runtime through the LD_LIBRARY_PATH variable. The approach requires you to pass the path to the link using the -rpath=<dir> option.
Note how the -L option is still required for the linker to find the library at link time.
Dynamic and static linking
Linking can be performed either dynamically or statically.
Dynamic linking is where executables include only references to libraries; the libraries themselves must be provided at run time. This makes the executable smaller, and also allows for different versions of the libraries to be selected and used at run time. The paths for these libraries are searched in the following order of precedence:
rpath, which is set at compilation time with commands such as-Wl,rpath=The
LD_LIBRARY_PATHenvironment variable, which can be altered prior to run time.
If different versions of the same library are provided in the paths embedded in the executable via rpath and in LD_LIBRARY_PATH, the rpath takes precedence. Using rpath ensures more reproducible runtimes, since the library will always be that pointed to by rpath. Using LD_LIBRARY_PATH can result in a runtime setup that can change if this environment variable is listed. For example, if a library is provided in two different paths, /path/A and /path/B, the order in which these paths are listed in LD_LIBRARY_PATH will dictate which one is used, the first one listed being used. This can impact reproducibility.
Static linking is where library object files are embedded in the final executable. This increases the size of the executable, but makes it more portable and ensures reproducibility. However, it does limit the executable from using optimised builds of a library that may be present if these libraries were not included at compile time.
On Pawsey systems, we recommend dynamic linking and when possible the use of rpath at compilation time.
Tips on library dependencies
This section gives you advice on how to deal with some common issues that can occur when working with external libraries.
How can I tell where a given symbol is referenced or defined?
You can pass the -y<symbol_name>
linker option to print out the location of each file where <symbol_name> is referenced. This can be useful to determine the location of unresolved symbols, and also to check where a symbol is ultimately resolved if there are a large number of libraries involved in linking. For instance, if we were looking for the dgemm_ symbol, you can run the command shown in terminal 8. Note that there is no space in the option. Terminal 8 also shows the output produced because of the -y option.
How can I list the library dependencies of an executable?
Sometimes you may need to know which libraries an executable is linking to at runtime, for instance, to ensure that a specific library version is being used. To do so, you can use the ldd command, which accepts the full path to the executable as an argument. It prints a list of library symbols referenced in the executable, together with the corresponding library locations:
$ ldd <exec>
How to compile an MPI, OpenMP, OpenACC, HIP or CUDA code
Instructions and examples for compiling code for distributed and parallel applications can be found in the system-specific pages.
On cray system cray-mpich is loaded by default. On other systems to compile MPI enable code, for example with openmpi
$ module load openmpi/<VERSION>
$ cc -c main.c
$ cc -o main main.o -L/usr/local/mylib/libs -l<library-name>To compile openMP enable code or MPI+openMP enabled code, use -fopenmp flag during compilation
$ cc -fopenmp -c main.c
$ cc -o main main.o -fopenmp -L/usr/local/mylib/libs -l<library-name>To compile openACC enabled code or MPI+openACC enabled code, use -fopenacc flag during compilation
$ cc -fopenacc -c main.c
$ cc -o main main.o -fopenacc -L/usr/local/mylib/libs -l<library-name>