Debugging with DDT
DDT is a parallel debugger, which allows the debugging of parallel programs written with MPI.
Prerequisite knowledge
To start using DDT, you should first be familiar with how to install and run your program and the programming language it uses.
Overview
The recommended method to debug executable programs is to use the Distributed Debugging Tool (DDT) currently known as Linaro-DDT (previously Arm-DDT). This tool is part of the Linaro-Forge suite, which is a commercial product available to all users on Pawsey systems.
The Linaro-Forge license supports a total number of 1024 running processes (tasks) at a time. For instance, the licence won't allow any other user to run a debugging job if user A is debugging a 512 task job and users B and C are profiling a 256 task job each.
DDT provides a very powerful framework to debug serial, MPI, OpenMP and mixed mode executables and is driven by a Graphical User Interface (GUI).
DDT has two usage modes:
- DDT Remote Client GUI can be executed on the local machine (laptop or desktop)
In this mode the Remote Client can connect to a debugging job previously submitted with the sbatch command. - DDT GUI can be executed directly on the login node
In this mode debugging job is being submitted to the SLURM queueing system from within the DDT GUI.
The following example illustrates the steps required to start a DDT session with the use of DDT Remote Client.
Best Practice
Use the Remote Client GUI to debug with DDT.
Context
Debugging analyses running programs to identify and address errors or unexpected behavior. Refer to the Debugging page for more information.
Step-by-Step Example
In this section we will provide a step-by-step introduction to DDT.
Step 1: Get the source code
In this example we will work with an example C code debugme.c given below.
#include <stdio.h> #include <mpi.h> int main(int argc, char ** argv) { int rank; int * oops = NULL; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); *oops = rank; MPI_Finalize(); return 0; }
Attempting to run this program will fail with a segmentation fault caused by the attempt to dereference a NULL pointer at line 12.
$ salloc -p debug --nodes=1 $ cc -O0 -g -o debugme debugme.c $ srun --export=all -n 1 debugme srun: error: nid001007: task 0: Segmentation fault (core dumped) srun: launch/slurm: _step_signal: Terminating StepId=39338.0
Step 2: Compile with debugging options
Compile the code with available C compiler. Use -O0 -g
compiler options. This will enable you to view and navigate through the source code of your application while debugging. You can find more details about debugging options for different compilers in: Compiler Options for Debugging.
$ cc -O0 -g -o debugme debugme.c
Step 3: Download and install Forge Remote Client
Visit the Linaro downloads page and download the Forge Remote Client (available for Windows, OS/X and Linux) that corresponds to the version of Forge, Arm-Forge or Linaro-Forge available in Pawsey systems.
Note that the version of the Remote Client should be compatible with the Forge version available on the Pawsey's system you are planning to use for debugging. Ideally, it should be the same version and, if not available, then sometimes the closest newer version works fine.
Run the module avail forge command to check which versions are available on the particular system, for instance
$ module avail forge ---------------------- /opt/cray/pe/lmod/modulefiles/core ---------------------- forge/21.1.2 ---------------------- /software/pawsey/modulefiles ---------------------------- arm-forge/21.1.2 arm-forge/22.1.2 (D) linaro-forge/22.1.0
Install in your own desktop the correct/compatible version of the Forge Remote Client by following instructions in the installer.
After installation, you will need to configure the client in your computer by indicating the Remote Installation Directory in the Pawsey cluster. To obtain this information, you can execute the module show
command and look for the value of the DDT_CURPATH
variable:
$ module show arm-forge/21.1.2 . . . setenv("DDT_LEVEL","21.1.2") setenv("DDT_VERSION","21.1.2") setenv("DDT_CURPATH","/opt/forge/21.1.2") setenv("PE_FORGE_MODULE_NAME","arm-forge") . . .
Once you obtained the value of DDT_CURPATH
, you need to add it as the Remote Installation Directory in the configuration. For that, open the client on your local machine, select the Configure option from the Remote Launch menu. Choose Add and configure the remote launch settings. You will also need to give name to the specific connection settings and provide the Host Name. Settings example for Setonix are shown on the screenshot below (change the username placeholder for your real username):
Figure 1. Settings needed to connect to Setonix with Arm DDT.
Choose OK to save the configuration.
The optional remote script entry should be left blank. One can then try the Test Remote Launch option for which you will need to enter your password.
Note that available versions and installation directories may be different in the latest Forge modules available, but the general configuration procedure is the same.
Step 3: Execute DDT debugging job
You can execute your debugging job in two ways (choose one of the methods described below):
Submit your job to the SLURM queueing system
Write a batch queueing script:
#!/bin/bash --login #SBATCH --nodes=1 #SBATCH --ntasks=4 #SBATCH --partition=debug #SBATCH --account=[your-project] #SBATCH --time=0:15:00 #SBATCH --export=none module load arm-forge/21.1.2 ddt --connect srun -n 4 ./debugme
Above script describes a 15 minute single node job which executes 4 processes in the debug
queue.
Note that we are using --connect
option for ddt
. This will cause the debugging session to hang and wait for the remote client to connect.
You can now submit the job to the queueing system:
$ sbatch job.slurm
Use the interactive session
Allocate a debugging session in the debugq by running:
$ salloc --nodes=1 --ntasks=4 --partition=debug --account=[your-project] --time=0:15:00
You can now run the debugging job:
$ module load arm-forge/21.1.2 $ ddt --connect srun -n 4 ./debugme
Step 4: Execute the Remote Client and connect
Now start Forge Remote Client on your local machine and connect to the Pawsey's system (select the correct option from the "Remote Launch" menu). A pop up window will show requesting Reverse Connect Request.
Accepting this will connect your remote client to the debugging session active in the Pawsey's system.
Now click "Run" to start the debugging session.
Step 5: Debug the program
When the main DDT window appears, the code will not have started to execute, but is ready for debugging. To start execution, press the green "play" button at the top left. For this example execution will reveal rapidly the location of the problem at line 12 in the source code window.
Next steps
DDT provides a great deal of functionality to help to debug parallel programs which is clearly not required for the simple example below. For instance, the memory debugging mechanism available in DDT is especially useful for solving memory management issues and bugs.
For further information, consult the DDT user guide.
Note DDT currently does not support AMD GPUs, see Debugging GPU-Enabled Software for alternative tools.