Compiler optimisation levels are flags or switches that are provided to the compiler. Each level specifies a set of automated optimisations to apply to the code during compilation.

Introduction

Several optimisation techniques are implemented by the different compilers that are available on Pawsey supercomputing systems. Optimisation techniques are usually grouped into general optimisation levels. Users can choose the level that matches their preferred optimisation strategy. General optimisation levels provide a very convenient way for controlling the balance of performance, size of executable, accuracy, and compile-time of the code.

The specific effects at each optimisation level depend on which family of compilers you use. This page describes the levels in a general way, then gives more detailed guidance about the effect of using each level based on information provided by the developers of the compiler families.

We provide a recommended optimisation level and other options for each compiler family. Start with these, but it's important that you also run tests at other levels and carefully verify the results. In some cases, more aggressive optimisation might affect numerical results or might not improve performance.

General optimisation levels

The compiler families that are available on Pawsey systems are Cray Fortran, GNU, PGI, INTEL and the ones based on LLVM/CLANG, that is, AOCC and the Cray C/C++ compilers.

Table 1 describes and gives recommendations for using each general optimisation level, as defined by the developers of the different compiler families.

Table 1. Comparison of optimisation levels and recommendations across compiler families

Flag	Cray	AOCC	GNU	Intel	PGI
-O0	No optimisations are applied. The compilation is suitable for debugging purposes.
`-O1`	C/C++ An intermediate level between `-O0` and `-O2` Fortran Moderate compile time and size Global scalar optimizations, and loop nest restructuring Results may differ from the results obtained when `-O0` is specified because of operator reassociation No optimisations will be performed that might create false exceptions Only array syntax statements and inner loops are vectorised and the system does not perform some vector reductions User tasking is enabled, so OpenMP directives are recognised	An intermediate level between `-O0` and `-O2`	Turns on the most common forms of optimisation that do not require any speed-space tradeoffs With this option the resulting executables should be smaller and faster than with `-O0` The more expensive optimizations, such as instruction scheduling, are not used at this level Compiling with the option `-O1` can often take less time than compiling with `-O0`, due to the reduced amounts of data that need to be processed after simple optimisations	Enables optimizations for speed and disables some optimizations that increase code size and affect speed Enables global optimization which includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling Disables inlining of some intrinsics If `-O1` is specified, then by default the `-Os` option will also be enabled which focuses on optimizations that do not increase code size When using the `-O1` option, the compiler's auto-vectorization functionality is disabled	Specifies local optimisation Scheduling of basic blocks is performed Register allocation is performed Local optimisation is a good choice when the code is very irregular, such as code that contains many short statements containing IF statements and does not contain loops (DO or DO WHILE statements) Although this case rarely occurs, for certain types of code, this optimisation level may perform better than level two (-⁠O2)
`-O`	Defaults to `-O1` for C/C+, `-O2` for Fortran	Defaults to `-O1`	Defaults to `-O1`	Defaults to `-O2`	When no level is specified, level two global optimisations are performed, including traditional scalar optimisations, induction recognition, and loop invariant motion No SIMD vectorisation is enabled
`-O2`	C/C++ Enable most optimisations Fortran Moderate compile time and size Global scalar optimizations, pattern matching, and loop nest restructuring Results may differ from results obtained when `-O1` is specified because of vector reductions The `-O2` option enables automatic vectorisation of array syntax and entire loop nests	Enable most optimisations	Turns on further optimizations, in addition to those used by `-O1` These additional optimizations include instruction scheduling Only optimisations that do not require any speed-space tradeoffs are used, so the executable should not increase in size The compiler will take longer to compile programs and require more memory than with `-O1` This option is generally the best choice for deployment of a program, because it provides maximum optimisation without increasing the executable size	Enables optimizations for code speed This is the generally recommended optimisation level The compiler vectorisation is enabled at `-O2` and higher levels The compiler performs some basic loop optimisations, inlining of intrinsic, Intra-file interprocedural optimisation, and most common compiler optimisation technologies	Specifies global optimisation Performs all level one local optimisation as well as level two global optimisation described in `-⁠O` More advanced optimisations such as SIMD code generation, cache alignment, and partial redundancy elimination are enabled
`-O3`	C/C++ Enables all optimisations, including ones that take longer to apply and generate larger code This level provides more optimisations than the base LLVM version, including partial unswitching, improvements to inlining, and unrolling Fortran Potentially larger compile time and size Global scalar optimizations Possible loop nest restructuring, and pattern matching The optimizations performed might create false exceptions in rare instances Results may differ from results obtained when `-O1` is specified because of vector reductions	Enables all optimisations, including ones that take longer to apply and generate larger code This level provides more optimisations than the base LLVM version, including partial unswitching, improvements to inlining, and unrolling	Turns on more expensive optimisations, such as function inlining, in addition to all the optimisations of the lower levels `-O2` and `-O1` The `-O3` optimization level may increase the speed of the resulting executable, but can also increase its size Under some circumstances where these optimisations are not favorable, this option might actually make a program slower	Performs `-O2` optimizations and enables more aggressive loop transformations such as Fusion, Block-Unroll-and-Jam, and collapsing IF statements Using the `-O3` optimisations may not cause higher performance unless loop and memory access transformations take place The optimisations may slow down code in some cases compared to `-O2` optimisations The `-O3` option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets	Level three specifies aggressive global optimisation This level performs all level one and level two optimisations and enables more aggressive hoisting and scalar replacement optimisations that may or may not be profitable
`-Ofast`	C/C++ Enables all the optimisations of `-O3` plus others that may violate strict compliance with language standards	Enables all the optimisations of `-O3` plus others that may violate strict compliance with language standards	Disregard strict standards compliance. -Ofast enables all -O3 optimizations. Also enables optimizations that are not valid for all standards-compliant programs: -ffast-math and the Fortran-specific -fno-protect-parens and -fstack-arrays
-O4	C/C++ Equivalent to `-O3` Fortran	Equivalent to `-O3`			Level four performs all level one, level two, and level three optimisations and enables hoisting of guarded invariant floating point expressions

Recommended options and choosing the optimisation level

A single set of optimisation options cannot meet all of the different requirements and implementation details of computer codes. However, based on experience with building scientific packages we recommend that you start with the options listed for each compiler family in the following table.

Cray Fortran	`-O3 -hfp3`
Cray C/C++	`-O3`
AOCC	`-O3`
GNU	`-O3 -funroll-loops`
Intel	`-O2 -ipo`
PGI	`-fast -Mipa=fast`

Carefully check the results before using those optimisation levels in production builds and runs.

Using the -O2 level and especially the -O3 level might affect the numerical results of the code. If optimisations are affecting numerical results, you might choose to try switching off specific optimisations with corresponding options or switching to the -O1 level.
Although -O3 optimisation level might produce much faster code (especially in the case of heavy use of floating-point calculations) there are cases where it can actually reduce the final performance of the code.

User Support Documentation

Compiler Optimisation Levels

Analytics

Introduction

General optimisation levels

Recommended options and choosing the optimisation level

Related pages

Related content