Course Organization and GPU Access¶

To follow this course, learners are expected to have a basic understanding of C/C++ and Fortran programming. This course will cover OpenACC, OpenMP Offloading, and HIP, focusing on GPU programming techniques for scientific and engineering applications.

C/C++ (Required)
- C/C++ provides low-level control over memory and system resources, making it ideal for performance-critical applications. It is the foundational language for CUDA and HIP programming.
Fortran (Optional)
- Fortran remains widely used in scientific computing, particularly in numerical simulations (e.g., climate modeling), due to its efficiency and legacy in scientific applications.
OpenMP (Optional)
- OpenMP is an API for parallel programming on multicore CPUs. This course will introduce OpenMP Offloading for parallelizing code on CPUs and GPUs.

Below is an overview of the programming models and their target architectures:

Model	Implementation	Supported Languages	Target Architectures
OpenACC	Directives	Fortran, C, C++	CPUs, GPUs, OpenPOWER
OpenMP Offloading	Directives	Fortran, C, C++	CPUs, GPUs, Xeon Phi
HIP	Language extension	C, C++	AMD and NVIDIA GPUs
CUDA	Language extension	C, C++ (Fortran)	NVIDIA GPUs
OpenCL	Language extension	C, C++	GPUs, CPUs, FPGAs
C++ AMP	Language extension	C++	CPUs, GPUs
RAJA	C++ abstraction	C++	CPUs, GPUs
TBB	C++ abstraction	C++	CPUs
C++17	Language feature	C++	CPUs
Fortran 2008	Language feature	Fortran	CPUs

Course Structure¶

Duration: 5 weeks
Topics: The course is divided into four main parts:
- OpenMP Offloading: Covers OpenMP for parallel programming across CPUs and accelerators (GPUs), from basics to advanced topics.
- OpenACC Programming: Provides an in-depth guide to the OpenACC model for portable GPU programming.
- HIP Programming: Teaches HIP for GPU programming on AMD and NVIDIA platforms, allowing a more versatile approach to GPGPU.
- Profiling and Performance Optimization: Focuses on profiling tools and techniques to optimize GPU code for improved performance.
Topics Breakdown:
- Each topic would have between 6 and 8 sections, and each section has one article, quiz, and discussion.
- Each section would highlight and summarize the importance of that concept through videos.

What You Will Learn¶

GPU architecture (NVIDIA and AMD): covering memory hierarchy, streaming multiprocessors, TPUs, and more.
Parallel programming using OpenACC, OpenMP Offloading, and HIP, including thread organization, OpenACC directives, and OpenMP clauses.
Applying GPU programming to computational tasks in numerical linear algebra.
Advanced programming techniques in OpenACC, OpenMP Offloading, and HIP.
Code optimization: profiling methods and performance tuning for OpenACC, OpenMP Offloading, and HIP.

Accessing GPUs¶

You can access GPUs through a personal computer, a computing cluster or supercomputer, or cloud platforms (e.g., Google Cloud, Amazon Web Services). Instructions for each method are provided below.

Personal Computer:
- If you have a desktop or laptop with a compatible GPU, install the necessary compilers as described in the next section.
Cluster or Supercomputer:
- Login: Connect to the cluster or supercomputer using SSH. Cluster documentation typically provides the details.
- Load Modules: After logging in, load the required software environment modules (e.g., CUDA Toolkit, NVIDIA HPC Compiler) through Lmod, which helps manage software versions and dependencies.
Cloud Platform:
- Google Cloud: Google Cloud offers access to the latest GPU architectures with pricing based on usage. You can set up a virtual machine and follow the CUDA installation instructions.
- Amazon Web Services (AWS): AWS provides on-demand GPU instances. For more information, visit AWS customer support or refer to the AWS GPU instance documentation.

Compiler Requirements¶

The following compilers and toolkits are recommended for this course:

NVIDIA HPC SDK (for OpenACC and OpenMP Offloading):
- The NVIDIA HPC SDK includes compilers that support both OpenACC and OpenMP Offloading for GPUs. It is available for Linux and Windows. Installation guide
CUDA Toolkit (for CUDA and HIP Programming on NVIDIA GPUs):
- The CUDA Toolkit supports Linux, macOS, and Windows. Install it by following the instructions for each OS:
  - Linux
  - Windows
  - macOS
ROCm (for HIP Programming on AMD GPUs):
- ROCm is AMD’s open software platform supporting HIP for AMD GPUs. It is available for Linux; check the ROCm installation guide for setup details.

If you are unfamiliar with any of these compilers or toolkits, reviewing their documentation prior to the course is highly recommended.