Serial Optimisation
Serial optimisation refers to the use of programming best practices and techniques to improve the performance of programs without the use of thread-level or process-level parallelism such as in OpenMP or MPI.
Because parallelism runs multiple copies of code at the same time, it is important to ensure that code is well written and performs well before those kinds of parallelism are applied.
Prerequisite knowledge
In order to get started with serial optimisation of codes, you should first be familiar with how to install and run your program, including knowing how to update compilation flags for compiled languages. You should also be familiar with the programming language used in your program.
Use of serial optimisation
There is an increasing list of different technology features available in modern processor architectures which might influence an application's performance. You should pay special attention to the single-core performance of your codes.
Compilers are very successful in applying different single-core optimisations. However, compilers can also fail to apply a given optimisation, such as when it is not easy to automatically understand the structure of the code and identify dependencies between different variables, arrays or structures. Therefore, it is extremely important to understand or find out the following factors:
- What optimisation techniques might be applied to the code on a given architecture.
- How to change the algorithm to use a specific type of technology feature of a given architecture.
- What type of optimisations compilers can automatically apply on a given architecture.
- What type of optimisations were applied by the compiler to a given code.
- Why the compiler failed to apply a specific type of optimisation to a given code.
Optimisation techniques
The focus of this part of the documentation is the available types of optimisations and compiler options. It also shows how to configure the compiler and analyse compiler feedback. Some important serial optimisation techniques are described on the following pages:
- Compiler Optimisation Levels — Compiler optimisation levels are flags or switches that are provided to the compiler. Each level specifies a set of automated optimisations to apply to the code during compilation.
- Inlining — Inline expansion, or inlining, uses a variety of language keywords or compiler optimisations to directly include the instructions of a function where it is called to avoid the overhead of a function call.
- Loop Optimisations — Loop optimisations are a group of compiler optimisations aimed at increasing execution speed and reducing the overheads associated with loops.
- Numerical Optimisations — At higher optimisation levels, numerical optimisations can affect the numerical precision of programs. It is important to be aware of when and how this can occur.
- Vectorisation — Vectorisation is a feature of modern processors that can execute a single instruction on multiple data objects in parallel within a single core.
Next steps
This subsection covers only basic information about serial optimisations available on modern CPUs. For more detailed information, refer to the Pawsey user training material.
We also recommend the textbook Introduction to High Performance Computing for Scientists and Engineers (external link) for further reading on this topic.