Profiling GPU-Enabled codes
Profiling GPU-enabled codes requires specific profilers. This section discusses the available GPU-enabled profiling tools on Setonix.
Prerequisite knowledge
To start using GPU-enabled debuggers you should first be familiar with how to compile and install GPU-enabled code (using APIs such as HIP) and the general idea of debugging.
Available GPU-Enabled Profilers
The number of GPU-enabled profilers supporting AMD GPUs is ever-growing, though not all are available on Setonix. There are several tools provided by AMD, others provided by Cray
The current ones available on Setonix are
- ROCPROF - Tools developed by AMD and provided by the
rocm/<version>
module.
Future debuggers
- Cray-Pat - Tool developed by Cray provided by the perftools
/<version>
module and should soon support for AMD GPUs. - Omnitrace, Omniperf - Tools developed by ARM and provided by the
omnitrace/<version>
andomniperf/<version>
modules respectively. They are still experimental.
ROCPROF
The rocprof
command-line program is a profiling tool that supports AMD GPUs. It is similar to gprof tool that is found in most Linux environments. rocprof collects performance metrics using sampling techniques as well as instrumentation, focusing on HIP calls for the GPU.
The following is an overview of profiling your HIP-enabled program:
Enable profiling in your program by compiling and linking with appropriate flags for the compiler.
- Run your program as usual, and a file containing profiling information will be generated.
- Analyse the output of this file using the
rocprof
command.
This page provides a step-by-step example of using the rocprof
tool to generate profiling information regarding the performance of the GPU by your program.
Running ROCPROF
The first step is to run rocprof tool on a GPU node with HIP-enabled executable.
$ salloc -A <project>-gpu -p gpu-dev --nodes=1 --ntasks=1 --gpus-per-task=1 --cpus-per-task=8 $ module load rocm/<VERSION> $ module load craype-accel-amd-gfx90a $ hipcc -o profileme profileme.cpp $ srun --export=all rocprof ./profileme RPL: on '230504_160218' from '/opt/rocm-5.0.2/rocprofiler' in '<current_dir>' RPL: profiling '"./profileme"' RPL: input file '' RPL: output dir '/tmp/rpl_data_230504_160218_84697' RPL: result dir '/tmp/rpl_data_230504_160218_84697/input_results_230504_160218' ROCProfiler: input from "/tmp/rpl_data_230504_160218_84697/input.xml" 0 metrics ROCPRofiler: 0 contexts collected, output directory /tmp/rpl_data_230504_160218_84697/input_results_230504_160218 File '<local_path>/results.csv' is generating
There are many metrics that rocprof can produce. For more information one can look at the help
$ module load rocm/<VERSION> $ rocprof -h RPL: on '230504_155216' from '/opt/rocm-5.0.2/rocprofiler' in '<current_dir>' ROCm Profiling Library (RPL) run script, a part of ROCprofiler library package. Full path: /opt/rocm-5.0.2/rocprofiler/bin/rocprof Metrics definition: /opt/rocm-5.0.2/rocprofiler/lib/metrics.xml ...
Typically one can trace the HIP calls and the HSA calls (Heterogeneous System Architecture, lower level) using command line arguments.
$ srun --export=all rocprof --hip-trace ./profileme RPL: on '230504_160218' from '/opt/rocm-5.0.2/rocprofiler' in '<current_dir>' RPL: profiling '"./profileme"' RPL: input file '' RPL: output dir '/tmp/rpl_data_230504_160218_84697' RPL: result dir '/tmp/rpl_data_230504_160218_84697/input_results_230504_160218' ROCProfiler: input from "/tmp/rpl_data_230504_160218_84697/input.xml" ROCTracer (pid=84991): HIP-trace() hsa_copy_deps: 0 scan hip API data 1:2 File '<current_dir>/results.hip_stats.csv' is generating dump json 1:2 File '<current_dir>results.json' is generating $ # looking at HSA layer $ srun --export=all rocprof --hsa-trace ./profileme RPL: on '230504_161001' from '/opt/rocm-5.0.2/rocprofiler' in '<current_dir>' RPL: profiling '"./profile"' RPL: input file '' RPL: output dir '/tmp/rpl_data_230504_161001_87188' RPL: result dir '/tmp/rpl_data_230504_161001_87188/input_results_230504_161001' ROCProfiler: input from "/tmp/rpl_data_230504_161001_87188/input.xml" 0 metrics ROCTracer (pid=87210): HSA-trace() HSA-activity-trace() ROCPRofiler: 0 contexts collected, output directory /tmp/rpl_data_230504_161001_87188/input_results_230504_161001 hsa_copy_deps: 1 scan hsa API data 2245:2246 File '<current_dir>/results.hsa_stats.csv' is generating dump json 2245:2246 File '<current_dir>/results.json' is generating File '<current_dir>/results.copy_stats.csv' is generating
ROCPROF will overwrite results as the default name for output is results.json. We recommand invoking the command with -o <output_file_name>
to ensure you do not accidentally overwrite profiling results.
Related Pages
External
- For information on ROCTRACER used by ROCPROF, see https://docs.amd.com/bundle/ROCTracer-User-Guide-v5.0-/page/Introduction_to_ROCTracer_User_Guide.html
- For information on ROCPROF, see https://docs.amd.com/bundle/ROCProfiler-User-Guide-v5.1/page/rocprof_Command_Line_Tool.html
- For information on Omnitrace, see https://github.com/AMDResearch/omnitrace
- For information on Omniperf see Omnitrace documentation