Debugging with DDT

DDT is a parallel debugger, which allows the debugging of parallel programs written with MPI.

Prerequisite knowledge

To start using DDT, you should first be familiar with how to install and run your program and the programming language it uses.

Overview

The recommended method to debug executable programs is to use the Distributed Debugging Tool (DDT) currently known as Linaro-DDT (previously Arm-DDT). This tool is part of the Linaro-Forge suite, which is a commercial product available to all users on Pawsey systems.


The Linaro-Forge license supports a total number of 1024 running processes (tasks) at a time. For instance, the licence won't allow any other user to run a debugging job if user A is debugging a 512 task job and users B and C are profiling a 256 task job each.

DDT provides a very powerful framework to debug serial, MPI, OpenMP and mixed mode executables and is driven by a Graphical User Interface (GUI). 

DDT has two usage modes:

  • DDT Remote Client GUI can be executed on the local machine (laptop or desktop)
    In this mode the Remote Client can connect to a debugging job previously submitted with the sbatch command.  
  • DDT GUI can be executed directly on the login node
    In this mode debugging job is being submitted to the SLURM queueing system from within the DDT GUI.

The following example illustrates the steps required to start a DDT session with the use of DDT Remote Client.

Best Practice

Use the Remote Client GUI to debug with DDT.

Context

Debugging analyses running programs to identify and address errors or unexpected behavior. Refer to the Debugging page for more information.

Step-by-Step Example

In this section we will provide a step-by-step introduction to DDT.

Step 1: Get the source code

In this example we will work with an example C code debugme.c given below.

Source code
#include <stdio.h>
#include <mpi.h>
 
int main(int argc, char ** argv) {

	int rank;
	int * oops = NULL;

	MPI_Init(&argc, &argv);
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);

	*oops = rank;

	MPI_Finalize();

	return 0;
}

Attempting to run this program will fail with a segmentation fault caused by the attempt to dereference a NULL pointer at line 12.

Error message
$ salloc -p debug --nodes=1 
$ cc -O0 -g -o debugme debugme.c
$ srun --export=all -n 1 debugme 
srun: error: nid001007: task 0: Segmentation fault (core dumped)
srun: launch/slurm: _step_signal: Terminating StepId=39338.0

Step 2: Compile with debugging options

Compile the code with available C compiler. Use -O0 -g compiler options. This will enable you to view and navigate through the source code of your application while debugging. You can find more details about debugging options for different compilers in: Compiler Options for Debugging.

Compiling with debug flags
$ cc -O0 -g -o debugme debugme.c

Step 3: Download and install Forge Remote Client

Visit the Linaro downloads page and download the Forge Remote Client (available for Windows, OS/X and Linux) that corresponds to the version of Forge, Arm-Forge or Linaro-Forge available in Pawsey systems.

Note that the version of the Remote Client should be compatible with the Forge version available on the Pawsey's system you are planning to use for debugging. Ideally, it should be the same version and, if not available, then sometimes the closest newer version works fine.

Run the module avail forge command to check which versions are available on the particular system, for instance

Checking available versions
$ module avail forge
---------------------- /opt/cray/pe/lmod/modulefiles/core ----------------------
   forge/21.1.2
---------------------- /software/pawsey/modulefiles ----------------------------
   arm-forge/21.1.2    arm-forge/22.1.2 (D)     linaro-forge/22.1.0

Install in your own desktop the correct/compatible version of the Forge Remote Client by following instructions in the installer.

After installation, you will need to configure the client in your computer by indicating the Remote Installation Directory in the Pawsey cluster. To obtain this information, you can execute the module show command and look for the value of the DDT_CURPATH variable:

Finding DDT_CURPATH value to be used in the Remote Client configuration
$ module show arm-forge/21.1.2
.
.
.
setenv("DDT_LEVEL","21.1.2")
setenv("DDT_VERSION","21.1.2")
setenv("DDT_CURPATH","/opt/forge/21.1.2")
setenv("PE_FORGE_MODULE_NAME","arm-forge")
.
.
.

Once you obtained the value of DDT_CURPATH , you need to add it as the Remote Installation Directory in the configuration. For that, open the client on your local machine, select the Configure option from the Remote Launch menu. Choose Add and configure the remote launch settings. You will also need to give name to the specific connection settings and provide the Host Name. Settings example for Setonix are shown on the screenshot below (change the username placeholder for your real username):

Figure 1. Settings needed to connect to Setonix with Arm DDT.

Choose OK to save the configuration.

The optional remote script entry should be left blank. One can then try the Test Remote Launch option for which you will need to enter your password.

Note that available versions and installation directories may be different in the latest Forge modules available, but the general configuration procedure is the same.

Step 3: Execute DDT debugging job

You can execute your debugging job in two ways (choose one of the methods described below):

Submit your job to the SLURM queueing system

Write a batch queueing script:

job.slurm
#!/bin/bash --login
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --partition=debug
#SBATCH --account=[your-project]
#SBATCH --time=0:15:00
#SBATCH --export=none

module load arm-forge/21.1.2

ddt --connect srun -n 4 ./debugme

Above script describes a 15 minute single node job which executes 4 processes in the debug queue.

Note that we are using --connect option for ddt. This will cause the debugging session to hang and wait for the remote client to connect.

 You can now submit the job to the queueing system:

$ sbatch job.slurm

Use the interactive session

Allocate a debugging session in the debugq by running:

$ salloc --nodes=1 --ntasks=4 --partition=debug --account=[your-project] --time=0:15:00

You can now run the debugging job:

Connecting to the program
$ module load arm-forge/21.1.2
$ ddt --connect srun -n 4 ./debugme

Step 4: Execute the Remote Client and connect

Now start Forge Remote Client on your local machine and connect to the Pawsey's system (select the correct option from the "Remote Launch" menu). A pop up window will show requesting Reverse Connect Request.

Accepting this will connect your remote client to the debugging session active in the Pawsey's system.

Now click "Run" to start the debugging session.


Step 5: Debug the program

When the main DDT window appears, the code will not have started to execute, but is ready for debugging. To start execution, press the green "play" button at the top left. For this example execution will reveal rapidly the location of the problem at line 12 in the source code window.

Next steps

DDT provides a great deal of functionality to help to debug parallel programs which is clearly not required for the simple example below. For instance, the memory debugging mechanism available in DDT is especially useful for solving memory management issues and bugs.

For further information, consult the DDT user guide.


Note DDT currently does not support AMD GPUs, see Debugging GPU-Enabled Software for alternative tools.

Related pages