Skip to content

OpenACC Profiling

In this article, we will study how to profile both C/C++ and Fortran code and analyze & optimize the important traces and metrics in the application.

Profiling is very important to analyze the code to see where it spends most of the time. This will give us detailed information about all the functions, time consumption, memory transfer, and all the API time consumption. There are a few tools are exiting to profile the OpenACC code, which is as follows:

Among these will go through how to use the PGI compiler for the profiling. The PGI profiling is already a part of the Nvidia HPC SDK.

PGI compiler for the OpenACC provides the parallel strategy and data movement information at the compile time. This applies to both GPUs and CPUs.

To see the whole code profiling information, please use:

pgcc -fast -Minfo=all -ta=tesla -acc Vector_Addition_OpenACC.c
pgfortran -fast -Minfo=all -ta=tesla -acc Vector_Addition_OpenACC.f90

To see just kernel profiling information, please use:

pgcc -fast -Minfo=accel -ta=tesla -acc Vector_Addition_OpenACC.c
pgfortran -fast -Minfo=accel -ta=tesla -acc Vector_Addition_OpenACC.f90

Command line profiling:

The following steps will provide a detailed view of the profiling step by step:

  • The first step would be just to compile the entire code:

    pgcc -fast -Minfo=all -ta=tesla -acc Vector_Addition_OpenACC.c

  • Then, if you do not know what to look for in the profiling, then please type the following command to query the list of options:

    // this will show the list of options that pgprof provides. pgprof --help

  • For example, to see the following information:

  • GPU kernel execution profile

      pgprof --print-gpu-summary ./a.out
      pgprof --print-gpu-trace ./a.out
    
  • CUDA API execution profile

      pgprof --print-api-summary ./a.out
      pgprof --print-api-trace ./a.out
    
  • OpenACC execution profile

      pgprof --print-openacc-trace ./a.out
      pgprof --print-openacc-summary ./a.out
    
  • CPU execution profile

      pgprof --cpu-profiling-mode flat ./a.out
    

Visual Profiling:

Sometimes we also would like to see the visual profiler, especially the application's communication and computation time in the application. Because most of the time, those are the parameters we should be looking at and try to optimize the time consumption. Please refer to the below steps on how to visualize the profiled data using the pgprof.

  • We need to create an output file that can be opened by the pgprof:
pgprof -o profiled-output.pgprof --cpu-profiling-mode flat ./a.out
  • Then, to open the file, we need to open the GPU of pgprof.
  • Once the pgprof is opened, we can easily open the profiled-output.pgprof file.
  • Figure 1 shows the example of pgprof GUI.

figure Figure 1: Example of pgprof GUI profiling

There are a few important environmental variables which are supported by the PGI compiler, and these can be set the compilation time:

  • PGI_ACC_DEBUG
  • runtime debugging
  • ACC_NOTIFY
  • writes out a line for each kernel and data movement
  • options: 1 - kernels launch; 2 - data transfer; 4 - synchronous operations; 8 - region entry/exit; 16 - data allocation/free
  • PGI_ACC_TIME
  • lightweight profiler for a summary of the program
  • PGI_ACC_SYNCHRONOUS
  • disabling the synchronous operations
  • Example usage:
  • for csh: setenv ACC_NOTIFY 1
  • for bash: export ACC_NOTIFY=1