...
If you need to compile third-party software, check How to Manually Build Software.
How to choose a compiler family
Sometimes it does not matter whether you use the GNU, AMD or Cray compilers, as all of them support a common set of features for supported programming languages (for instance, C). However, there are cases where you may want to use a specific compiler.
...
The compilation process is presented using the GNU compiler for the C programming language but what is described applies also to other compilers. The examples make use of the C compiler wrapper, cc
, and the PrgEnv-gnu
environment. C/C++ and Fortran compilation should all use the Cray provided wrappers that add all the appropriate libraries to enable MPI. These are
Column | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
|
Step 1. Compiling to object files
...
Column | |||||||||
---|---|---|---|---|---|---|---|---|---|
| |||||||||
|
...
Instructions and examples for compiling code for distributed and parallel applications can be found in the system-specific pages.
Common compiler options
Some relevant families of compiler options are discussed here. A more comprehensive list of options can be found in system-specific pages as well as in the Serial optimisation section.
...
On cray system cray-mpich is loaded by default. On other systems to compile MPI enable code, for example with openmpi
Code Block |
---|
$ module load openmpi/<VERSION>
$ cc -c main.c
$ cc -o main main.o -L/usr/local/mylib/libs -l<library-name> |
To compile openMP enable code or MPI+openMP enabled code, use -fopenmp flag during compilation
Code Block |
---|
$ cc -fopenmp -c main.c
$ cc -o main main.o -fopenmp -L/usr/local/mylib/libs -l<library-name> |
To compile openACC enabled code or MPI+openACC enabled code, use -fopenacc flag during compilation
Code Block |
---|
$ cc -fopenacc -c main.c
$ cc -o main main.o -fopenacc -L/usr/local/mylib/libs -l<library-name> |
To compile HIP enabled GPU code or MPI+HIP enabled GPU code on Setonix
Code Block |
---|
$ module load rocm/<VERSION>
$ module load craype-accel-amd-gfx90a
$ hipcc --offload-arch=gfx90a main.c |
To compile MPI+HIP enabled GPU code on Setonix
Code Block |
---|
$ module load rocm/<VERSION>
$ module load craype-accel-amd-gfx90a
$ hipcc --offload-arch=gfx90a main.c -I${MPICH_DIR}/include -L${MPICH_DIR}/lib -lmpi |
To compile MPI+HIP enabled GPU code on Setonix with GPU-enabled MPI transfers (note the environment variable is also needed at runtime):
Code Block |
---|
$ module load rocm/<VERSION>
$ module load craype-accel-amd-gfx90a
$ export MPICH_GPU_SUPPORT_ENABLED=1
$ hipcc --offload-arch=gfx90a main.c -I${MPICH_DIR}/include -L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa |
To compile CUDA enabled GPU code or MPI+CUDA enabled GPU code on Garrawarla
Code Block |
---|
$ module load cuda/<VERSION>
$ nvcc main.c |
Common compiler options
Some relevant families of compiler options are discussed here. A more comprehensive list of options can be found in system-specific pages as well as in the Serial optimisation section.
- Optimization level. You can use the
-O<n>
option, which is valid for all compilers, to control the optimisation level. It is a quick way to gain additional performance or to assist in debugging optimisation-related bugs. The higher level 3 optimisation-O3
can make significant differences especially for loops with floating-point operations. Level 0 disables many optimisations and allows for consistent debugging, it also reduces the final size of the executable. Higher optimisation levels in most cases produce faster code, at the expense of compilation time and the ability to debug the program. It is generally recommended to use the-O2
or-O3
optimisation levels for production executables, provided there is no optimisation-related difference in the numerical results. Refer to the Serial optimisation section for further information on optimisation options. CPU-specific instructions. The default behaviour of the GNU compiler is to produce executable code that is compatible across a broad range of processors. This is useful if the executable must run across multiple processor generations. However, if you are concerned about the speed of the executable, as is the case in supercomputing, you should allow the compiler to generate processor-specific instructions for the code. For the GNU compilers, the
-mtune=native
option will generate code that is specific to the processor the compilation is performed on.Column Note Your code must be compiled to take advantage of the architecture-specific instructions of the compute nodes on which it will run. You can do this simply by compiling your code on a compute node. If for some reason you need to compile from a login node, there are additional compile options that allow you to generate CPU-specific instructions for the compute nodes.
Inlining. Compilers are able to automatically inline code from routines in other object files. This can significantly reduce calling overhead for frequently called routines and allow further optimisations. In the case of GNU compilers, the
-O3
optimisation level enables function inlining where possible. For lower levels of optimisation, you can use the-finline-functions
option. To enable interprocedural inlining, you must use both the two options-fwhole-program
and-combine
.Debugging and profiling. Compiler options for debugging are discussed in Compiler Options for Debugging. Profiler options required by the gprof tool are documented in Profiling with gprof.
...
Visit the User Guide of the system you want to compile your code on for tailored suggestions.
Related pages
- Pawsey Supercomputing Systems Guides per Supercomputer
- Compiler Options for Debugging
- Serial Optimisation
- How to Manually Build Software
...