Profiling GPU-Enabled codes

Profiling GPU-enabled codes requires specific profilers. This section discusses the available GPU-enabled profiling tools on Setonix. 

On this Page

Prerequisite knowledge

To start using GPU-enabled debuggers you should first be familiar with how to compile and install GPU-enabled code (using APIs such as HIP) and the general idea of debugging. 

Available GPU-Enabled Profilers 

The number of GPU-enabled profilers supporting AMD GPUs is ever-growing, though not all are available on Setonix. There are several tools provided by AMD, others provided by Cray

The current ones available on Setonix are 

  • ROCPROF - Tools developed by AMD and provided by the rocm/<version> module.

Future debuggers

  • Cray-Pat - Tool developed by Cray provided by the perftools/<version> module and should soon support for AMD GPUs.
  • Omnitrace, Omniperf - Tools developed by ARM and provided by the omnitrace/<version> and omniperf/<version> modules respectively. They are still experimental.  

ROCPROF

The rocprof command-line program is a profiling tool that supports AMD GPUs. It is similar to gprof tool that is found in most Linux environments. rocprof collects performance metrics using sampling techniques as well as instrumentation, focusing on HIP calls for the GPU. 

The following is an overview of profiling your HIP-enabled program:

  1. Enable profiling in your program by compiling and linking with appropriate flags for the compiler.

  2. Run your program as usual, and a file containing profiling information will be generated.
  3. Analyse the output of this file using the rocprof command.

This page provides a step-by-step example of using the rocprof  tool to generate profiling information regarding the performance of the GPU by your program.

Running ROCPROF

The first step is to run rocprof tool on a GPU node with HIP-enabled executable.

Running ROCPROF
$ salloc -A <project>-gpu -p gpu-dev --nodes=1 --ntasks=1 --gpus-per-task=1 --cpus-per-task=8  
$ module load rocm/<VERSION>
$ module load craype-accel-amd-gfx90a
$ hipcc -o profileme profileme.cpp
$ srun --export=all rocprof ./profileme

RPL: on '230504_160218' from '/opt/rocm-5.0.2/rocprofiler' in '<current_dir>'
RPL: profiling '"./profileme"'
RPL: input file ''
RPL: output dir '/tmp/rpl_data_230504_160218_84697'
RPL: result dir '/tmp/rpl_data_230504_160218_84697/input_results_230504_160218'
ROCProfiler: input from "/tmp/rpl_data_230504_160218_84697/input.xml"
  0 metrics

ROCPRofiler: 0 contexts collected, output directory /tmp/rpl_data_230504_160218_84697/input_results_230504_160218
File '<local_path>/results.csv' is generating

There are many metrics that rocprof can produce. For more information one can look at the help

ROCPROF HELP
$ module load rocm/<VERSION>
$ rocprof -h

RPL: on '230504_155216' from '/opt/rocm-5.0.2/rocprofiler' in '<current_dir>'
ROCm Profiling Library (RPL) run script, a part of ROCprofiler library package.
Full path: /opt/rocm-5.0.2/rocprofiler/bin/rocprof
Metrics definition: /opt/rocm-5.0.2/rocprofiler/lib/metrics.xml
...

Typically one can trace the HIP calls and the HSA calls (Heterogeneous System Architecture, lower level) using command line arguments. 

Different ROCPROF metrics
$ srun --export=all rocprof --hip-trace ./profileme
RPL: on '230504_160218' from '/opt/rocm-5.0.2/rocprofiler' in '<current_dir>'
RPL: profiling '"./profileme"'
RPL: input file ''
RPL: output dir '/tmp/rpl_data_230504_160218_84697'
RPL: result dir '/tmp/rpl_data_230504_160218_84697/input_results_230504_160218'
ROCProfiler: input from "/tmp/rpl_data_230504_160218_84697/input.xml" 
ROCTracer (pid=84991):
    HIP-trace()
hsa_copy_deps: 0
scan hip API data 1:2    File '<current_dir>/results.hip_stats.csv' is generating
dump json 1:2
File '<current_dir>results.json' is generating
$ # looking at HSA layer
$ srun --export=all rocprof --hsa-trace ./profileme
RPL: on '230504_161001' from '/opt/rocm-5.0.2/rocprofiler' in '<current_dir>'
RPL: profiling '"./profile"'
RPL: input file ''
RPL: output dir '/tmp/rpl_data_230504_161001_87188'
RPL: result dir '/tmp/rpl_data_230504_161001_87188/input_results_230504_161001'
ROCProfiler: input from "/tmp/rpl_data_230504_161001_87188/input.xml"
  0 metrics
ROCTracer (pid=87210):
    HSA-trace()
    HSA-activity-trace()

ROCPRofiler: 0 contexts collected, output directory /tmp/rpl_data_230504_161001_87188/input_results_230504_161001
hsa_copy_deps: 1
scan hsa API data 2245:2246                                                                                                    File '<current_dir>/results.hsa_stats.csv' is generating
dump json 2245:2246
File '<current_dir>/results.json' is generating
File '<current_dir>/results.copy_stats.csv' is generating


ROCPROF will overwrite results as the default name for output is results.json. We recommand invoking the command with -o <output_file_name> to ensure you do not accidentally overwrite profiling results. 


Related Pages

External