Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

With the update of ROCm to version 5.2.3, improved profiler tools become available for GPU applications. That is because the new version of ROCm enables full access to hardware performance counters. As a result, rocProf can now collect performance counters on kernels run on AMD GPU architectures. rocProf works for HIP kernels, as well as GPU offloading OpenMP and OpenACC applications.

Omniperf

There is an open source tool Omniperf which allows interpreting profile data collected by RocProf. Let's take a look at how Omniperf can be useful in the context of an example code that performs a scalar multiplication and vector addition (SAXPY).

Ui tabs


Ui tab
titleHIP SAXPY


bashEmacs

Once Omniperf generates profile data in workloads folder it can be analysed in the following way:

bashEmacs

At this stage you need to open a terminal on your local computer and execute the following ssh port forwarding command:

bashEmacs

Please note that 10.253.133.237 is taken from the output of the above omniperf analyze command.

With this you should be able to navigate to http://0.0.0.0:8050/ with your preferred browser and access the omniperf dashboard. 

The page allows interaction with user and there are several panels, including Empirical Roofline Analysis. 

 


Ui tab
titleOpenMP SAXPY


bashEmacs

Once Omniperf generates profile data in workloads folder it can be analysed in the following way:

bashEmacs

At this stage you need to open a terminal on your local computer and execute the following ssh port forwarding command:

bashEmacs

Please note that 10.253.133.237 is taken from the output of the above omniperf analyze command.

With this you should be able to navigate to http://0.0.0.0:8050/ with your preferred browser and access the omniperf dashboard. 

The page allows interaction with user and there are several panels, including Empirical Roofline Analysis. 

 


Ui tab
titleOpenACC SAXPY



bashEmacs

Once Omniperf generates profile data in workloads folder it can be analysed in the following way:

bashEmacs

At this stage you need to open a terminal on your local computer and execute the following ssh port forwarding command:

bashEmacs

Please note that 10.253.133.237 is taken from the output of the above omniperf analyze command.

With this you should be able to navigate to http://0.0.0.0:8050/ with your preferred browser and access the omniperf dashboard. 

The page allows interaction with user and there are several panels, including Empirical Roofline Analysis. 

 


Omnitrace

Omnitrace is an AMD research initiative aimed at gathering runtime performance data for software applications. It is compatible with programs coded in C, C++, Fortran, and Python, as well as with computational frameworks such as OpenCL and HIP. Please ensure that you load the necessary modules for Omnitrace.


Ui tabs


Ui tab
titleHIP SAXPY

Profiling can be done in two steps. First, omnitrace  instruments  the application for profiling. Second, it runs the generated *.inst file.

bashEmacs

It creates  omnitrace-saxpy.inst-output folder with a date-stamped subfolder, where profile data are stored. At this stage one can download *.proto file to a local computer and open it with   ui.perfetto.dev. From the perfetto analysis one can observe the timing and duration of the code executions on the host, as well as the timing of kernel executions on the device. Additionally, you should be able examine all host to device and device to host data transfers.

 


Ui tab
titleOpenMP SAXPY

Profiling can be done in two steps. First, omnitrace  instruments  the application for profiling. Second, it runs the generated *.inst file.

bashEmacs

It creates omnitrace-saxpy_openmp.inst-output folder with a date-stamped subfolder, where profile data are stored. At this stage one can download *.proto file to a local computer and open it with   ui.perfetto.dev. From the perfetto analysis one can observe the timing and duration of the code executions on the host, as well as the timing of kernel executions on the device. Additionally, you should be able examine all host to device and device to host data transfers.

 


Ui tab
titleOpenACC SAXPY

Profiling can be done in two steps. First, omnitrace  instruments  the application for profiling. Second, it runs the generated *.inst file.

bashEmacs

It creates omnitrace-saxpy_openacc.inst-output folder with a date-stamped subfolder, where profile data are stored. At this stage one can download *.proto file to a local computer and open it with   ui.perfetto.dev. From the perfetto analysis one can observe the timing and duration of the code executions on the host, as well as the timing of kernel executions on the device. Additionally, you should be able examine all host to device and device to host data transfers.

 


...