Skip to content

Profiling OpenMP Offloading with NVHPC and ROCm

Script
  • "Profiling is a crucial step to ensure efficient execution of OpenMP offloaded code on modern architectures. It helps identify bottlenecks and inefficiencies, allowing developers to optimize their applications. Profiling tools such as Nsight Systems and rocprof provide valuable insights into resource utilization, whether on NVIDIA's or AMD's platforms."
  • "NVIDIA's HPC SDK provides two main profiling tools for OpenMP offloading: Nsight Systems and Nsight Compute. Nsight Systems enables system-wide performance analysis, while Nsight Compute offers in-depth kernel profiling to evaluate GPU performance. Both tools support GUI-based result visualization for easy analysis."
  • ""ROCm provides a suite of profiling tools for OpenMP offloading. rocprof is used for collecting traces and kernel metrics, while rocminfo offers system configuration details. AMD uProf complements these by identifying performance bottlenecks, making it a versatile tool for optimization."

Profiling is an essential task in optimizing computer code. Writing parallel code is manageable, but achieving efficiency on a given parallel architecture is challenging. Profiling helps identify where the code spends the most time, whether it is compute-bound, memory-bound, or suffering from cache misses, memory leaks, improper vectorization, or register spilling. This document explores profiling tools for OpenMP Offloading using NVHPC and ROCm.


NVHPC Profiling Tools

The NVIDIA HPC SDK (NVHPC) supports profiling OpenMP Offloading applications using tools such as Nsight Systems and Nsight Compute.

Nsight Systems

Nsight Systems provides a system-wide performance overview, including CPU and GPU profiling.

Example: Nsight Systems
# Compilation with OpenMP Offloading
$ nvc -mp=gpu -o example example.c

# Profiling with Nsight Systems
$ nsys profile -t openmp,nvtx,cuda ./example

# Open the profiling results
$ nsys-ui profile1.nsys-rep
# Compilation with OpenMP Offloading
$ nvfortran -mp=gpu -o example example.f90

# Profiling with Nsight Systems
$ nsys profile -t openmp,nvtx,cuda ./example

# Open the profiling results
$ nsys-ui profile1.nsys-rep

Nsight Compute

Nsight Compute is a kernel profiler for analyzing GPU kernel performance.

Example: Nsight Compute
# Compilation with OpenMP Offloading
$ nvc -mp=gpu -o example example.c

# Profiling with Nsight Compute
$ ncu --set full ./example

# Open the profiling results
$ ncu-ui report.ncu-rep
# Compilation with OpenMP Offloading
$ nvfortran -mp=gpu -o example example.f90

# Profiling with Nsight Compute
$ ncu --set full ./example

# Open the profiling results
$ ncu-ui report.ncu-rep

ROCm Profiling Tools

ROCm offers a suite of tools for profiling OpenMP Offloading applications, including rocprof and rocminfo.

rocprof

The rocprof tool is used for collecting traces, performance metrics, and analyzing OpenMP Offloading performance.

Example: rocprof
# Compilation with OpenMP Offloading
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -o example example.c

# Profiling with rocprof
$ rocprof --hsa-trace --hip-trace ./example
# Compilation with OpenMP Offloading
$ flang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -o example example.f90

# Profiling with rocprof
$ rocprof --hsa-trace --hip-trace ./example

rocminfo

rocminfo provides system-level details about the ROCm-enabled devices and configuration.

Example: rocminfo
# View available GPUs and system configurations
$ rocminfo

AMD uProf

AMD uProf supports profiling OpenMP applications, including collecting traces and identifying bottlenecks.

Example: AMD uProf
# Compilation with OpenMP Offloading
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -o example example.c

# Profiling with AMD uProf
$ AMDuProfCLI collect --trace openmp --config tbp --output-dir solution ./example -d 1
# Compilation with OpenMP Offloading
$ flang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -o example example.f90

# Profiling with AMD uProf
$ AMDuProfCLI collect --trace openmp --config tbp --output-dir solution ./example -d 1

Summary

Both NVHPC and ROCm offer robust profiling tools tailored for OpenMP Offloading. These tools provide insights into memory utilization, kernel execution times, vectorization, and overall performance, helping developers identify bottlenecks and optimize their code effectively.

Tool Platform Key Features
Nsight Systems NVHPC System-wide profiling, CPU+GPU performance
Nsight Compute NVHPC GPU kernel performance analysis
rocprof ROCm Trace collection, metrics analysis
AMD uProf ROCm Trace analysis, bottleneck identification
rocminfo ROCm System configuration and GPU details