Introduction to OpenACC

In this section, we will study the basics of the OpenACC programming model and compiler flag options for different programming languages.

OpenACC is a user-driven, directive-based, performance-portable parallel programming model. It is designed for scientists and engineers who want to port their code to a wide variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort compared to using low-level models. The OpenACC specification supports C, C++, and Fortran programming languages and multiple hardware architectures, including x86 and POWER CPUs, NVIDIA GPUs, AMD GPUs, and Xeon Phi (KNL). The table below provides the compiler commands for different programming languages.

Here’s the equivalent table for compiling OpenACC code using nvc from NVIDIA’s HPC SDK:

Compiler or Tool Language or Function Command
NVC ISO/ANSI C11 and K&R C nvc
NVFORTRAN ISO/ANSI Fortran 2003 nvfortran
NVC++ ISO/ANSI C++14 with GNU compatibility nvc++
Cray Fortran Fortran with OpenACC Support ftn -h acc
NVIDIA Debugger Source code debugger cuda-gdb
Nsight Systems System-wide performance analysis nsys
Nsight Compute CUDA kernel profiling and analysis ncu

NVIDIA HPC SDK and Cray Compiler for OpenACC

On AMD GPUs, OpenACC is only supported with Fortran, not C/C++, by the Cray compiler. Other compilers are known to have issues when using C/C++.

Cray Compiler Notes:

NVC Compiler Notes:

The NVIDIA HPC compilers (nvc, nvfortran, nvc++) support OpenACC directly. Use the -acc flag to enable OpenACC for each of these compilers. And Cray Fortran compiler (ftn) all support the -acc flag (or -h acc for Cray).

For profiling, transition to using nsys and ncu as nvprof is deprecated and is not supported on devices with compute capability 8.0 and higher.

The listings below show the basic syntax for OpenACC in C/C++ and Fortran. To use the OpenACC API (Runtime Library Routines) in an application, include #include "openacc.h" for C/C++ and use openacc for Fortran.

// C/C++
#include "openacc.h"
#pragma acc <directive> [clauses [[,] clause] . . .] new-line
<code>
// Fortran
use openacc
!$acc <directive> [clauses [[,] clause] . . .]
<code>

The following points explain the basic syntax entries in the above listing.

Other compilers also exist for the OpenACC model. The table below shows different compiler flags and additional flags for various compiler options.

Compiler Compiler Flags Additional Flags
NVIDIA (NVC) -acc -gpu=target_architecture -Minfo=accel
GCC -fopenacc -foffload=offload_target
OpenUH Compile: -fopenacc Link: -lopenacc -Wb,-accarch:target_architecture
Cray C/C++: -h pragma=acc Fortran: -h acc,noomp -h msgs

Various compilers and their compiler flags

Various Compiler Flag Notes:

These flags enable and configure OpenACC compilation, allowing for efficient use of GPU or other accelerator architectures.

Summary

This section introduces the basics of the OpenACC programming model, a directive-based parallel programming approach that enables performance portability across diverse HPC hardware with minimal effort. OpenACC is ideal for scientists and engineers aiming to port code to heterogeneous architectures, including CPUs (x86, POWER), GPUs (NVIDIA, AMD), and specialized processors like Xeon Phi.

OpenACC supports C, C++, and Fortran, with distinct compiler commands for each language provided by tools such as NVIDIA’s nvc, nvfortran, and nvc++ compilers, along with debugging (cuda-gdb) and profiling (nsys, ncu) tools. Using the -acc flag activates OpenACC compilation, enabling users to write parallelized code that is both performance-portable and straightforward to maintain.

The OpenACC model provides various directives, including:

Directives are paired with clauses that define data handling, work distribution, and control flow for fine-grained control over parallel execution. Multiple compilers, including NVIDIA’s NVC, GCC, OpenUH, and Cray, support OpenACC, each with specific flags for targeting different hardware.

By abstracting low-level details, OpenACC simplifies parallel programming and allows code to be easily ported across architectures, providing a practical balance between performance and ease of use.