Profiling OpenMP Offloading with NVHPC and ROCm¶
Script
- "Profiling is a crucial step to ensure efficient execution of OpenMP offloaded code on modern architectures. It helps identify bottlenecks and inefficiencies, allowing developers to optimize their applications. Profiling tools such as Nsight Systems and rocprof provide valuable insights into resource utilization, whether on NVIDIA's or AMD's platforms."
- "NVIDIA's HPC SDK provides two main profiling tools for OpenMP offloading: Nsight Systems and Nsight Compute. Nsight Systems enables system-wide performance analysis, while Nsight Compute offers in-depth kernel profiling to evaluate GPU performance. Both tools support GUI-based result visualization for easy analysis."
- ""ROCm provides a suite of profiling tools for OpenMP offloading. rocprof is used for collecting traces and kernel metrics, while rocminfo offers system configuration details. AMD uProf complements these by identifying performance bottlenecks, making it a versatile tool for optimization."
Profiling is an essential task in optimizing computer code. Writing parallel code is manageable, but achieving efficiency on a given parallel architecture is challenging. Profiling helps identify where the code spends the most time, whether it is compute-bound, memory-bound, or suffering from cache misses, memory leaks, improper vectorization, or register spilling. This document explores profiling tools for OpenMP Offloading using NVHPC and ROCm.
NVHPC Profiling Tools¶
The NVIDIA HPC SDK (NVHPC) supports profiling OpenMP Offloading applications using tools such as Nsight Systems and Nsight Compute.
Nsight Systems¶
Nsight Systems provides a system-wide performance overview, including CPU and GPU profiling.
Example: Nsight Systems
# Compilation with OpenMP Offloading
$ nvc -mp=gpu -o example example.c
# Profiling with Nsight Systems
$ nsys profile -t openmp,nvtx,cuda ./example
# Open the profiling results
$ nsys-ui profile1.nsys-rep
# Compilation with OpenMP Offloading
$ nvfortran -mp=gpu -o example example.f90
# Profiling with Nsight Systems
$ nsys profile -t openmp,nvtx,cuda ./example
# Open the profiling results
$ nsys-ui profile1.nsys-rep
Nsight Compute¶
Nsight Compute is a kernel profiler for analyzing GPU kernel performance.
Example: Nsight Compute
# Compilation with OpenMP Offloading
$ nvc -mp=gpu -o example example.c
# Profiling with Nsight Compute
$ ncu --set full ./example
# Open the profiling results
$ ncu-ui report.ncu-rep
# Compilation with OpenMP Offloading
$ nvfortran -mp=gpu -o example example.f90
# Profiling with Nsight Compute
$ ncu --set full ./example
# Open the profiling results
$ ncu-ui report.ncu-rep
ROCm Profiling Tools¶
ROCm offers a suite of tools for profiling OpenMP Offloading applications, including rocprof and rocminfo.
rocprof¶
The rocprof
tool is used for collecting traces, performance metrics, and analyzing OpenMP Offloading performance.
Example: rocprof
# Compilation with OpenMP Offloading
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -o example example.c
# Profiling with rocprof
$ rocprof --hsa-trace --hip-trace ./example
# Compilation with OpenMP Offloading
$ flang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -o example example.f90
# Profiling with rocprof
$ rocprof --hsa-trace --hip-trace ./example
rocminfo¶
rocminfo
provides system-level details about the ROCm-enabled devices and configuration.
Example: rocminfo
# View available GPUs and system configurations
$ rocminfo
AMD uProf¶
AMD uProf supports profiling OpenMP applications, including collecting traces and identifying bottlenecks.
Example: AMD uProf
# Compilation with OpenMP Offloading
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -o example example.c
# Profiling with AMD uProf
$ AMDuProfCLI collect --trace openmp --config tbp --output-dir solution ./example -d 1
# Compilation with OpenMP Offloading
$ flang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -o example example.f90
# Profiling with AMD uProf
$ AMDuProfCLI collect --trace openmp --config tbp --output-dir solution ./example -d 1
Summary¶
Both NVHPC and ROCm offer robust profiling tools tailored for OpenMP Offloading. These tools provide insights into memory utilization, kernel execution times, vectorization, and overall performance, helping developers identify bottlenecks and optimize their code effectively.
Tool | Platform | Key Features |
---|---|---|
Nsight Systems | NVHPC | System-wide profiling, CPU+GPU performance |
Nsight Compute | NVHPC | GPU kernel performance analysis |
rocprof | ROCm | Trace collection, metrics analysis |
AMD uProf | ROCm | Trace analysis, bottleneck identification |
rocminfo | ROCm | System configuration and GPU details |