Compiler Optimisation Levels

Compiler optimisation levels are flags or switches that are provided to the compiler. Each level specifies a set of automated optimisations to apply to the code during compilation.

Introduction

Several optimisation techniques are implemented by the different compilers that are available on Pawsey supercomputing systems. Optimisation techniques are usually grouped into general optimisation levels. Users can choose the level that matches their preferred optimisation strategy. General optimisation levels provide a very convenient way for controlling the balance of performance, size of executable, accuracy, and compile-time of the code.

The specific effects at each optimisation level depend on which family of compilers you use. This page describes the levels in a general way, then gives more detailed guidance about the effect of using each level based on information provided by the developers of the compiler families.

We provide a recommended optimisation level and other options for each compiler family. Start with these, but it's important that you also run tests at other levels and carefully verify the results. In some cases, more aggressive optimisation might affect numerical results or might not improve performance.

General optimisation levels

The compiler families that are available on Pawsey systems are  Cray Fortran, GNU, PGI, INTEL and the ones based on LLVM/CLANG, that is, AOCC and the Cray C/C++ compilers.

Table 1 describes and gives recommendations for using each general optimisation level, as defined by the developers of the different compiler families.


Table 1. Comparison of optimisation levels and recommendations across compiler families

FlagCrayAOCCGNUIntelPGI
-O0

No optimisations are applied. The compilation is suitable for debugging purposes.

-O1

C/C++

  • An intermediate level between -O0 and -O2

Fortran

  • Moderate compile time and size
  • Global scalar optimizations, and loop nest restructuring
  • Results may differ from the results obtained when -O0 is specified because of operator reassociation
  • No optimisations will be performed that might create false exceptions
  • Only array syntax statements and inner loops are vectorised and the system does not perform some vector reductions
  • User tasking is enabled, so OpenMP directives are recognised
  • An intermediate level between -O0 and -O2
  • Turns on the most common forms of optimisation that do not require any speed-space tradeoffs
  • With this option the resulting executables should be smaller and faster than with -O0
  • The more expensive optimizations, such as instruction scheduling, are not used at this level
  • Compiling with the option -O1 can often take less time than compiling with -O0, due to the reduced amounts of data that need to be processed after simple optimisations
  • Enables optimizations for speed and disables some optimizations that increase code size and affect speed
  • Enables global optimization which includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling
  • Disables inlining of some intrinsics
  • If -O1 is specified, then by default the -Os option will also be enabled which focuses on optimizations that do not increase code size
  • When using the -O1 option, the compiler's auto-vectorization functionality is disabled
  • Specifies local optimisation
  • Scheduling of basic blocks is performed
  • Register allocation is performed
  • Local optimisation is a good choice when the code is very irregular, such as code that contains many short statements containing IF statements and does not contain loops (DO or DO WHILE statements)
  • Although this case rarely occurs, for certain types of code, this optimisation level may perform better than level two (-⁠O2)
-O

Defaults to -O1 for C/C+, -O2 for Fortran

Defaults to -O1Defaults to -O1Defaults to -O2
  • When no level is specified, level two global optimisations are performed, including traditional scalar optimisations, induction recognition, and loop invariant motion
  • No SIMD vectorisation is enabled
-O2

C/C++

  • Enable most optimisations

Fortran

  • Moderate compile time and size
  • Global scalar optimizations, pattern matching, and loop nest restructuring
  • Results may differ from results obtained when -O1 is specified because of vector reductions
  • The -O2 option enables automatic vectorisation of array syntax and entire loop nests
  • Enable most optimisations
  • Turns on further optimizations, in addition to those used by -O1
  • These additional optimizations include instruction scheduling
  • Only optimisations that do not require any speed-space tradeoffs are used, so the executable should not increase in size
  • The compiler will take longer to compile programs and require more memory than with -O1
  • This option is generally the best choice for deployment of a program, because it provides maximum optimisation without increasing the executable size
  • Enables optimizations for code speed
  • This is the generally recommended optimisation level
  • The compiler vectorisation is enabled at -O2 and higher levels
  • The compiler performs some basic loop optimisations, inlining of intrinsic, Intra-file interprocedural optimisation, and most common compiler optimisation technologies
  • Specifies global optimisation
  • Performs all level one local optimisation as well as level two global optimisation described in -⁠O
  • More advanced optimisations such as SIMD code generation, cache alignment, and partial redundancy elimination are enabled
-O3

C/C++

  • Enables all optimisations, including ones that take longer to apply and generate larger code
  • This level provides more optimisations than the base LLVM version, including partial unswitching, improvements to inlining, and unrolling

Fortran

  • Potentially larger compile time and size
  • Global scalar optimizations
  • Possible loop nest restructuring, and pattern matching
  • The optimizations performed might create false exceptions in rare instances
  • Results may differ from results obtained when -O1 is specified because of vector reductions
  • Enables all optimisations, including ones that take longer to apply and generate larger code
  • This level provides more optimisations than the base LLVM version, including partial unswitching, improvements to inlining, and unrolling
  • Turns on more expensive optimisations, such as function inlining, in addition to all the optimisations of the lower levels -O2 and -O1
  • The -O3 optimization level may increase the speed of the resulting executable, but can also increase its size
  • Under some circumstances where these optimisations are not favorable, this option might actually make a program slower
  • Performs -O2 optimizations and enables more aggressive loop transformations such as Fusion, Block-Unroll-and-Jam, and collapsing IF statements
  • Using the -O3 optimisations may not cause higher performance unless loop and memory access transformations take place
  • The optimisations may slow down code in some cases compared to -O2 optimisations
  • The -O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets
  • Level three specifies aggressive global optimisation
  • This level performs all level one and level two optimisations and enables more aggressive hoisting and scalar replacement optimisations that may or may not be profitable
-Ofast

C/C++

  • Enables all the optimisations of -O3 plus others that may violate strict compliance with language standards
  • Enables all the optimisations of -O3 plus others that may violate strict compliance with language standards
  • Disregard strict standards compliance. -Ofast enables all -O3 optimizations.
  • Also enables optimizations that are not valid for all standards-compliant programs: -ffast-math and the Fortran-specific -fno-protect-parens and -fstack-arrays


-O4

C/C++

  • Equivalent to -O3

Fortran

  • (error)


  • Equivalent to -O3
(error)(error)
  • Level four performs all level one, level two, and level three optimisations and enables hoisting of guarded invariant floating point expressions

Recommended options and choosing the optimisation level

A single set of optimisation options cannot meet all of the different requirements and implementation details of computer codes. However, based on experience with building scientific packages we recommend that you start with the options listed for each compiler family in the following table.


Cray Fortran

-O3 -hfp3

Cray C/C++-O3
AOCC-O3
GNU

-O3 -funroll-loops

Intel-O2 -ipo
PGI-fast -Mipa=fast

Carefully check the results before using those optimisation levels in production builds and runs.

  • Using the -O2 level and especially the -O3 level might affect the numerical results of the code. If optimisations are affecting numerical results, you might choose to try switching off specific optimisations with corresponding options or switching to the -O1 level.
  • Although -O3 optimisation level might produce much faster code (especially in the case of heavy use of floating-point calculations) there are cases where it can actually reduce the final performance of the code.

Related pages