Debugging
Debugging is the analysis of a program to identify errors in the code. Computer programs called debuggers can be used to analyse software bugs.
Debuggers are designed to control the execution of other codes in order to identify the location and cause of the bug. This section explains a few general debugging terms and concepts and then goes into more detail about using debuggers to find and fix bugs in your compiled code.
Prerequisite knowledge
To start debugging codes, you should first be familiar with how to install and run your program, and the programming language it uses.
Software bugs
A bug is an unexpected behaviour occurring when a computer code is executed. Some common types of software bugs are:
- Segmentation fault: An attempt to access a location in memory that the CPU or the GPU cannot physically address, for example when an array index used to perform some operations is outside the memory bound during a read/write operation,
- Memory leak: A computer program consumes memory but is unable to release it back to the operating system. An example of memory leak is a
malloc()
operation without the correspondingfree()
. - Unexpected behaviour: A code does not execute as expected. An example would be results being inconsistent when rerunning the code. Usually these bugs are the most challenging to fix.
Debugging software
A traditional, primitive method of identifying software bugs is to insert print or write statements at key points in the code. Using a debugger allows the user to identify bugs much faster. Debuggers provide some functions to control user code execution:
- A breakpoint allows the developer to stop the execution of the code and check its status. Application execution stops when the breakpoint line is reached. At this point it is possible to get the value of the variables within the current scope. This command is very useful to identify the region of code containing the bug by running an application until a desired point.
- Step and next are two of the most important functions available in any debugger. These two commands control the execution line by line. Both allow controlled execution advancing one line from the starting point; however in the case of a subroutine call, the step command goes into the subroutine, while the next command executes the subroutine.
- A watchpoint is similar to a breakpoint; however it applies to a variable rather than a function or line of code. When those variables are read or written, the watchpoint is triggered and program execution stops.
- Many modern debuggers allow the user to analyse data by plotting or getting statistics on some array slices when the execution is stopped. This feature is extremely worthwhile as software complexity increases.
Debugging parallel codes running on supercomputers can be more complex as there are multiple threads or processes running at the same time. Parallel debuggers are designed to support these kinds of programs as well.
Using debuggers
Programs can be modified to collect additional information about the code as it runs. This will enable debuggers to identify all symbols defined in the executable and to report specific, useful detail such as the source code line that is currently executing. The source files first need to be compiled with the -g flag to produce this debugging information. It is also good practice to turn off all optimisations with -O0 in order to avoid instruction reordering. See the Compiler Options for Debugging page for more detail.
Important
The cost of these debugging enhancements is that the code will execute less efficiently (that is, it will run slower). Therefore you should enable these flags only while testing and debugging your code.
The debugging process can then be outlined as follows:
- Identify the regions that are the probable source of bugs.
- Set one or more breakpoints or watchpoints.
- Start the execution of the user code under the debugger with the breakpoints/watchpoints active.
- Analyse the code at the breakpoints or watchpoints, stepping through the code as needed.
- Identify and fix the bugs based on the analysis.
- Run the code again and check if the bugs persist.
Common debuggers are gdb, DDT, TotalView. Arm DDT and TotalView allow debugging of parallel codes, so at a breakpoint it is possible to switch between processes or threads and check the current status or value of local arrays.
More information regarding the use of particular debuggers on Pawsey supercomputers can be found on the following pages: