Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Column
width900px


Code Block
languagebash
themeDJango
titleTerminal 7. Linking an external library
$ cc -o main main.o -Wl,-rpath=/usr/local/mylib/libs -L/usr/local/mylib/libs -l<library-name> 


...

Instructions and examples for compiling code for distributed and parallel applications can be found in the system-specific pages.

Common compiler options

Some relevant families of compiler options are discussed here. A more comprehensive list of options can be found in system-specific pages as well as in the Serial optimisation section.

...

On cray system cray-mpich is loaded by default. On other systems to compile MPI enable code, for example with openmpi

Code Block
$ module load openmpi/<VERSION>
$ cc -c main.c 
$ cc -o main main.o -L/usr/local/mylib/libs -l<library-name>

To compile openMP enable code or MPI+openMP enabled code, use -fopenmp flag during compilation

Code Block
$ cc -fopenmp -c main.c
$ cc -o main main.o -fopenmp -L/usr/local/mylib/libs -l<library-name>

To compile openACC enabled code or MPI+openACC enabled code, use -fopenacc flag during compilation

Code Block
$ cc -fopenacc -c main.c
$ cc -o main main.o -fopenacc -L/usr/local/mylib/libs -l<library-name>

To compile HIP enabled GPU code or MPI+HIP enabled GPU code on Setonix

Code Block
$ module load rocm/<VERSION>
$ module load craype-accel-amd-gfx90a
$ hipcc --offload-arch=gfx90a main.c

To compile MPI+HIP enabled GPU code on Setonix

Code Block
$ module load rocm/<VERSION>
$ module load craype-accel-amd-gfx90a
$ hipcc --offload-arch=gfx90a main.c -I${MPICH_DIR}/include -L${MPICH_DIR}/lib -lmpi 

To compile MPI+HIP enabled GPU code on Setonix with GPU-enabled MPI transfers (note the environment variable is also needed at runtime):

Code Block
$ module load rocm/<VERSION>
$ module load craype-accel-amd-gfx90a
$ export MPICH_GPU_SUPPORT_ENABLED=1
$ hipcc --offload-arch=gfx90a main.c -I${MPICH_DIR}/include -L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa

To compile CUDA enabled GPU code or MPI+CUDA enabled GPU code on Garrawarla

Code Block
$ module load cuda/<VERSION>
$ nvcc main.c

Common compiler options

Some relevant families of compiler options are discussed here. A more comprehensive list of options can be found in system-specific pages as well as in the Serial optimisation section.

  • Optimization level. You can use the -O<n> option, which is valid for all compilers, to control the optimisation level. It is a quick way to gain additional performance or to assist in debugging optimisation-related bugs. The higher level 3 optimisation -O3 can make significant differences especially for loops with floating-point operations. Level 0 disables many optimisations and allows for consistent debugging, it also reduces the final size of the executable. Higher optimisation levels in most cases produce faster code, at the expense of compilation time and the ability to debug the program. It is generally recommended to use the -O2 or -O3 optimisation levels for production executables, provided there is no optimisation-related difference in the numerical results. Refer to the Serial optimisation section for further information on optimisation options.
  • CPU-specific instructions. The default behaviour of the GNU compiler is to produce executable code that is compatible across a broad range of processors. This is useful if the executable must run across multiple processor generations. However, if you are concerned about the speed of the executable, as is the case in supercomputing, you should allow the compiler to generate processor-specific instructions for the code. For the GNU compilers, the -mtune=native option will generate code that is specific to the processor the compilation is performed on.

    Column


    Note

    Your code must be compiled to take advantage of the architecture-specific instructions of the compute nodes on which it will run. You can do this simply by compiling your code on a compute node. If for some reason you need to compile from a login node, there are additional compile options that allow you to generate CPU-specific instructions for the compute nodes.



  • Inlining. Compilers are able to automatically inline code from routines in other object files. This can significantly reduce calling overhead for frequently called routines and allow further optimisations. In the case of GNU compilers, the -O3 optimisation level enables function inlining where possible. For lower levels of optimisation, you can use the -finline-functions option. To enable interprocedural inlining, you must use both the two options -fwhole-program and -combine.

  • Debugging and profiling. Compiler options for debugging are discussed in Compiler Options for Debugging. Profiler options required by the gprof tool are documented in Profiling with gprof.

...

Visit the User Guide of the system you want to compile your code on for tailored suggestions.

Related pages

...