With the update of ROCm to version 5.2.3, improved profiler tools become available for GPU applications. That is because the new version of ROCm enables full access to hardware performance counters. As a result, rocProf can now collect performance counters on kernels run on AMD GPU architectures. rocProf works for HIP kernels, as well as GPU offloading OpenMP and OpenACC applications.
Omniperf
There is an open source tool Omniperf which allows interpreting profile data collected by RocProf. Let's take a look at how Omniperf can be useful in the context of an example code that performs a scalar multiplication and vector addition (SAXPY).
Ui tabs |
---|
Ui tab |
---|
|
bashEmacsOnce Omniperf generates profile data in workloads folder it can be analysed in the following way: bashEmacsAt this stage you need to open a terminal on your local computer and execute the following ssh port forwarding command: bashEmacsPlease note that 10.253.133.237 is taken from the output of the above omniperf analyze command. With this you should be able to navigate to http://0.0.0.0:8050/ with your preferred browser and access the omniperf dashboard. The page allows interaction with user and there are several panels, including Empirical Roofline Analysis. |
Ui tab |
---|
|
bashEmacsOnce Omniperf generates profile data in workloads folder it can be analysed in the following way: bashEmacsAt this stage you need to open a terminal on your local computer and execute the following ssh port forwarding command: bashEmacsPlease note that 10.253.133.237 is taken from the output of the above omniperf analyze command. With this you should be able to navigate to http://0.0.0.0:8050/ with your preferred browser and access the omniperf dashboard. The page allows interaction with user and there are several panels, including Empirical Roofline Analysis. |
Ui tab |
---|
|
bashEmacsOnce Omniperf generates profile data in workloads folder it can be analysed in the following way: bashEmacsAt this stage you need to open a terminal on your local computer and execute the following ssh port forwarding command: bashEmacsPlease note that 10.253.133.237 is taken from the output of the above omniperf analyze command. With this you should be able to navigate to http://0.0.0.0:8050/ with your preferred browser and access the omniperf dashboard. The page allows interaction with user and there are several panels, including Empirical Roofline Analysis. |
|
Omnitrace
Omnitrace is an AMD research initiative aimed at gathering runtime performance data for software applications. It is compatible with programs coded in C, C++, Fortran, and Python, as well as with computational frameworks such as OpenCL and HIP. Please ensure that you load the necessary modules for Omnitrace.
Ui tabs |
---|
Ui tab |
---|
| Profiling can be done in two steps. First, omnitrace instruments the application for profiling. Second, it runs the generated *.inst file. bashEmacsIt creates omnitrace-saxpy.inst-output folder with a date-stamped subfolder, where profile data are stored. At this stage one can download *.proto file to a local computer and open it with ui.perfetto.dev. From the perfetto analysis one can observe the timing and duration of the code executions on the host, as well as the timing of kernel executions on the device. Additionally, you should be able examine all host to device and device to host data transfers. |
Ui tab |
---|
| Profiling can be done in two steps. First, omnitrace instruments the application for profiling. Second, it runs the generated *.inst file. bashEmacsIt creates omnitrace-saxpy_openmp.inst-output folder with a date-stamped subfolder, where profile data are stored. At this stage one can download *.proto file to a local computer and open it with ui.perfetto.dev. From the perfetto analysis one can observe the timing and duration of the code executions on the host, as well as the timing of kernel executions on the device. Additionally, you should be able examine all host to device and device to host data transfers. |
Ui tab |
---|
| Profiling can be done in two steps. First, omnitrace instruments the application for profiling. Second, it runs the generated *.inst file. bashEmacsIt creates omnitrace-saxpy_openacc.inst-output folder with a date-stamped subfolder, where profile data are stored. At this stage one can download *.proto file to a local computer and open it with ui.perfetto.dev. From the perfetto analysis one can observe the timing and duration of the code executions on the host, as well as the timing of kernel executions on the device. Additionally, you should be able examine all host to device and device to host data transfers. |
|
...