Profiling

Profiling is the performance analysis of a program while it is running. It is carried out to provide insight and understanding as a first step toward improving performance.

On this Page

Prerequisite knowledge

In order to get started with profiling codes, you should first be familiar with how to install and run your program, including knowing how to update compilation flags for compiled languages. You should also be familiar with the programming language used in your program.

Purpose of profiling

The purpose of profiling a program is to understand the behaviour of a particular piece of code. Understanding the performance characteristics of the program is critical to subsequently improving the performance of the program. It allows the identification of bottlenecks, which are key regions of code where most of the processing time occurs. Profiling can be used to help understand the calling structure of an application, and is also a great way to gain familiarity with how a program works. Specialised programs called profilers are often used to help analyse the performance of the actual program of interest.

Profilers typically take one of two main approaches to profiling:

  • sampling, which periodically records which part of the program is active to generate a statistical distribution
  • instrumentation, which adds additional instructions to the program to measure performance directly

Sampling or statistical-sample based profilers work by sampling the running application at regular intervals and using this statistical information to determine which routines are occupied the most. This method has a very low overhead and is a good choice when first investigating an application. The downside is that sampling profilers are limited in what they can measure and are less accurate, especially for short run applications, compared to profiling using instrumentation.

Instrumentation can be added manually to source code prior to compilation to measure performance. This approach is often used when the user wishes to report the elapsed time of a specific kernel of code, and is often seen in microkernel benchmarks. This approach is less practical if the user wishes to investigate the performance across many regions of the code, or wishes to investigate other performance metrics other than elapsed time.

Instrumentation profiling tools work by automatically adding instrumentation to the code during program compilation, avoiding the need to manually modify the code. This is usually the most flexible and feature-rich profiling method. As a consequence this method can have a high overhead, especially for frequently executed functions, and can distort the performance reported. This downside can be overcome in many cases by using more targeted instrumentation, such as function filtering, or selecting a subset of performance metrics to collect.

Using the insights from these approaches, optimisation techniques can then be applied to improve the performance of the code.

Available profilers

These profiling tools and methods are available on Pawsey supercomputing systems:

  • Basic Profiling with the Time CommandThe time command runs a specified program and prints out timing statistics. It is useful when you want to analyse how different configuration options affect how long a program takes to complete.
  • Profiling using Manual InstrumentationManual instrumentation is the addition of code by the programmer that measures timings in a program. This is more time consuming than using profilers that automate the process. However, it allows the programmer complete control over exactly how the timing is measured.
  • Profiling with gprofThe gprof program is a profiling tool found in most Linux environments. It collects performance metrics using sampling techniques as well as instrumentation. It requires your program to be compiled with a supporting compiler such as the GNU compiler.
  • Profiling with ARM MAPARM MAP is a commercial profiling tool, and the recommended method of parallel profiling on Pawsey supercomputing systems. It provides a graphical user interface and remote client for analysing profiling information.
  • Profiling with ARM Performance ReportsARM Performance Reports is a commercial profiling tool, which provides a high level report regarding the performance of parallel programs. It can be used as a first step in understanding the overall performance of a parallel code.
  • Profiling GPU-Enabled codesProfiling GPU-enabled codes requires specific profilers. This section discusses the available GPU-enabled profiling tools on Setonix. 
  • AMD GPU profiling toolsOmniperf and Omnitrace are some AMD research initiatives aimed at interpreting profile data collected by rocProf and gathering runtime performance data for software applications.

Related pages