Introduction to OpenACC¶

Script

"OpenACC is a directive-based programming model designed for scientists and engineers who want to port code across diverse hardware with minimal effort. It’s optimized for performance portability, supporting C, C++, and Fortran, allowing code to run on CPUs, GPUs, and specialized processors like Xeon Phi."
"NVIDIA’s HPC SDK includes specific compilers for each language—NVC for C, NVFORTRAN for Fortran, and NVC++ for C++. Debugging and profiling tools, like cuda-gdb for debugging and nsys and ncu for system and kernel profiling, help monitor and optimize performance across CPU and GPU interactions."
"OpenACC provides various directives for parallel programming. Compute directives like parallel and kernels allow data parallelism across threads, while data management directives, such as data and cache, optimize memory transfers. For synchronization, the wait directive helps manage parallel execution. Directives are followed by clauses to further control data handling, work distribution, and flow."
"Compilers like NVIDIA’s NVC and GCC use specific flags to enable OpenACC. For instance, the -acc flag enables OpenACC in NVC, while GCC requires -fopenacc and an offload target specification like -foffload=nvptx-none. Cray’s compiler uses -h acc for OpenACC and offers -h msgs for detailed compilation messages."
"In summary, OpenACC offers compute, data management, and synchronization directives to control parallel programming across CPUs, GPUs, and other architectures. This model’s directive-based approach abstracts low-level details, achieving both performance portability and ease of use for complex scientific and engineering applications."

In this section, we will study the basics of the OpenACC programming model and compiler flag options for different programming languages.

OpenACC is a user-driven, directive-based, performance-portable parallel programming model. It is designed for scientists and engineers who want to port their code to a wide variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort compared to using low-level models. The OpenACC specification supports C, C++, and Fortran programming languages and multiple hardware architectures, including x86 and POWER CPUs, NVIDIA GPUs, AMD GPUs, and Xeon Phi (KNL). The table below provides the compiler commands for different programming languages.

Here’s the equivalent table for compiling OpenACC code using nvc from NVIDIA’s HPC SDK:

Compiler or Tool	Language or Function	Command
NVC	ISO/ANSI C11 and K&R C	`nvc`
NVFORTRAN	ISO/ANSI Fortran 2003	`nvfortran`
NVC++	ISO/ANSI C++14 with GNU compatibility	`nvc++`
Cray Fortran	Fortran with OpenACC Support	`ftn -h acc`
NVIDIA Debugger	Source code debugger	`cuda-gdb`
Nsight Systems	System-wide performance analysis	`nsys`
Nsight Compute	CUDA kernel profiling and analysis	`ncu`

NVIDIA HPC SDK and Cray Compiler for OpenACC

On AMD GPUs, OpenACC is only supported with Fortran, not C/C++, by the Cray compiler. Other compilers are known to have issues when using C/C++.

Cray Compiler Notes:¶

Cray Fortran Compiler (ftn -h acc): Cray provides OpenACC support for Fortran using the ftn command with the -h acc flag, which enables OpenACC directives.
OpenACC Usage: Like NVIDIA’s compilers, Cray Fortran compiler supports OpenACC for parallelization on GPUs.

NVC Compiler Notes:¶

NVC: For compiling C programs with OpenACC.
NVFORTRAN: For compiling Fortran programs with OpenACC.
NVC++: For compiling C++ programs with OpenACC.
cuda-gdb: The primary debugger in the NVIDIA HPC SDK for GPU-related debugging.
Nsight Systems (nsys): Provides system-wide performance analysis, helping to identify bottlenecks across CPU and GPU interactions.
Nsight Compute (ncu): Focuses on detailed CUDA kernel profiling and analysis, offering insights into kernel performance metrics.

The NVIDIA HPC compilers (nvc, nvfortran, nvc++) support OpenACC directly. Use the -acc flag to enable OpenACC for each of these compilers. And Cray Fortran compiler (ftn) all support the -acc flag (or -h acc for Cray).

For profiling, transition to using nsys and ncu as nvprof is deprecated and is not supported on devices with compute capability 8.0 and higher.

The listings below show the basic syntax for OpenACC in C/C++ and Fortran. To use the OpenACC API (Runtime Library Routines) in an application, include #include "openacc.h" for C/C++ and use openacc for Fortran.

// C/C++
#include "openacc.h"
#pragma acc <directive> [clauses [[,] clause] . . .] new-line
<code>

// Fortran
use openacc
!$acc <directive> [clauses [[,] clause] . . .]
<code>

The following points explain the basic syntax entries in the above listing.

A pragma instructs the compiler to run the region in parallel using a team of threads.
acc instructs the compiler to use the OpenACC directive definitions.
A directive is an instruction to the compiler on how the parallel region's code block should be executed. In OpenACC, three directive types are available: Compute directives, Data management directives, and Synchronization directives.
Compute directives: These enable data parallelism with multiple threads. The primary OpenACC compute directives are parallel, kernels, routine, and loop.
Data management directives: These directives help optimize data movement between different memory spaces. By default, the compiler handles data movement, but it may not be optimal. The OpenACC data directives are data, update, cache, atomic, declare, enter data, and exit data.
Synchronization directives: OpenACC supports task parallelism, allowing multiple constructs to execute concurrently. When control over execution is needed, OpenACC provides a wait directive.
A clause is an argument to directives that provides additional instructions to the compiler on directive behavior. OpenACC has three main clause categories: Data handling, Work distribution, and Control flow.
Data handling: These clauses override the compiler’s default analysis for variable handling. Examples include default, private, firstprivate, copy, copyin, copyout, create, delete, and deviceptr.
Work distribution: These clauses allow programmers to control the threading within a parallel region. Examples of work distribution clauses include seq, auto, gang, worker, vector, tile, num_gangs, num_workers, and vector_length.
Control flow: These clauses direct the compiler to control parallel directive execution. Examples include if or if_present, independent, reduction, async, and wait.

Other compilers also exist for the OpenACC model. The table below shows different compiler flags and additional flags for various compiler options.

Compiler	Compiler Flags	Additional Flags
NVIDIA (NVC)	`-acc`	`-gpu=target_architecture` `-Minfo=accel`
GCC	`-fopenacc`	`-foffload=offload_target`
OpenUH	Compile: `-fopenacc` Link: `-lopenacc`	`-Wb,-accarch:target_architecture`
Cray	C/C++: `-h pragma=acc` Fortran: `-h acc,noomp`	`-h msgs`

Various compilers and their compiler flags

Various Compiler Flag Notes:¶

NVIDIA (NVC): The -gpu=target_architecture flag specifies the target GPU architecture, such as -gpu=cc70 for CUDA compute capability 7.0.
GCC: -foffload=offload_target sets the offloading target (e.g., -foffload=nvptx-none for NVIDIA GPUs).
OpenUH: Flags may vary slightly depending on the OpenUH compiler setup; consult the documentation for specific architecture support.
Cray: The -h flags control OpenACC pragma and message settings, with -h acc enabling OpenACC pragmas and -h msgs displaying compilation messages.

These flags enable and configure OpenACC compilation, allowing for efficient use of GPU or other accelerator architectures.

Summary¶

This section introduces the basics of the OpenACC programming model, a directive-based parallel programming approach that enables performance portability across diverse HPC hardware with minimal effort. OpenACC is ideal for scientists and engineers aiming to port code to heterogeneous architectures, including CPUs (x86, POWER), GPUs (NVIDIA, AMD), and specialized processors like Xeon Phi.

OpenACC supports C, C++, and Fortran, with distinct compiler commands for each language provided by tools such as NVIDIA’s nvc, nvfortran, and nvc++ compilers, along with debugging (cuda-gdb) and profiling (nsys, ncu) tools. Using the -acc flag activates OpenACC compilation, enabling users to write parallelized code that is both performance-portable and straightforward to maintain.

The OpenACC model provides various directives, including:

Compute Directives (parallel, kernels, routine, loop), which distribute work across threads;
Data Management Directives (data, update, cache), which optimize memory transfers across different memory spaces;
Synchronization Directives (wait), which manage parallel task execution.

Directives are paired with clauses that define data handling, work distribution, and control flow for fine-grained control over parallel execution. Multiple compilers, including NVIDIA’s NVC, GCC, OpenUH, and Cray, support OpenACC, each with specific flags for targeting different hardware.

By abstracting low-level details, OpenACC simplifies parallel programming and allows code to be easily ported across architectures, providing a practical balance between performance and ease of use.