Course Organization and GPU Access¶

Script

"This course focuses on GPU programming models, specifically OpenACC, OpenMP Offloading, and HIP, which are tailored for scientific and engineering applications. To get the most out of this course, you’ll need a basic understanding of C or C++ for low-level GPU programming. Familiarity with Fortran is optional but highly valuable for scientific computing tasks."
"The course introduces a range of programming models, each suited to specific architectures. OpenACC and OpenMP Offloading are versatile, supporting both CPUs and GPUs. CUDA is mainly for Nvidia GPUs. HIP, on the other hand, targets AMD and NVIDIA GPUs.
"The course is structured over five weeks and divided into four main areas: OpenMP Offloading for parallel programming across CPUs and GPUs, OpenACC for portable GPU programming, HIP for multi-platform GPU development, and profiling and optimization techniques to maximize GPU code performance."
"By the end of this course, you’ll have a solid understanding of GPU architecture, including the designs of NVIDIA and AMD GPUs. You’ll be proficient in parallel programming using OpenACC, OpenMP Offloading, and HIP. Additionally, you’ll know how to apply these techniques to computational tasks like numerical linear algebra and effectively optimize GPU code."
"Accessing GPUs for this course can be done in several ways: through a compatible personal computer, by connecting to clusters or supercomputers via SSH, or using cloud platforms like Google Cloud and AWS. These platforms provide access to the latest GPU architectures with flexible and scalable pricing."
"The course also requires specific compilers and toolkits. The NVIDIA HPC SDK supports OpenACC and OpenMP Offloading, while the CUDA Toolkit is essential for CUDA programming on NVIDIA GPUs. For AMD GPUs, the ROCm platform is required to enable HIP programming. Installation guides are available for all major operating systems to help you get started."

To follow this course, learners are expected to have a basic understanding of C/C++ and Fortran programming. This course will cover OpenACC, OpenMP Offloading, and HIP, focusing on GPU programming techniques for scientific and engineering applications.

C/C++ (Required)
- C/C++ provides low-level control over memory and system resources, making it ideal for performance-critical applications. It is the foundational language for CUDA and HIP programming.
Fortran (Optional)
- Fortran remains widely used in scientific computing, particularly in numerical simulations (e.g., climate modeling), due to its efficiency and legacy in scientific applications.
OpenMP (Optional)
- OpenMP is an API for parallel programming on multicore CPUs. This course will introduce OpenMP Offloading for parallelizing code on CPUs and GPUs.

Below is an overview of the programming models and their target architectures:

Model	Implementation	Supported Languages	Target Architectures
OpenACC	Directives	Fortran, C, C++	CPUs, GPUs, OpenPOWER
OpenMP Offloading	Directives	Fortran, C, C++	CPUs, GPUs, Xeon Phi
HIP	Language extension	C, C++	AMD and NVIDIA GPUs
CUDA	Language extension	C, C++ (Fortran)	NVIDIA GPUs
OpenCL	Language extension	C, C++	GPUs, CPUs, FPGAs
C++ AMP	Language extension	C++	CPUs, GPUs
RAJA	C++ abstraction	C++	CPUs, GPUs
TBB	C++ abstraction	C++	CPUs
C++17	Language feature	C++	CPUs
Fortran 2008	Language feature	Fortran	CPUs

Course Structure¶

Duration: 5 weeks
Topics: The course is divided into four main parts:
- OpenMP Offloading: Covers OpenMP for parallel programming across CPUs and accelerators (GPUs), from basics to advanced topics.
- OpenACC Programming: Provides an in-depth guide to the OpenACC model for portable GPU programming.
- HIP Programming: Teaches HIP for GPU programming on AMD and NVIDIA platforms, allowing a more versatile approach to GPGPU.
- Profiling and Performance Optimization: Focuses on profiling tools and techniques to optimize GPU code for improved performance.
Topics Breakdown:
- Each topic would have between 6 and 8 sections, and each section has one article, quiz, and discussion.
- Each section would highlight and summarize the importance of that concept through videos.

What You Will Learn¶

GPU architecture (NVIDIA and AMD): covering memory hierarchy, streaming multiprocessors, TPUs, and more.
Parallel programming using OpenACC, OpenMP Offloading, and HIP, including thread organization, OpenACC directives, and OpenMP clauses.
Applying GPU programming to computational tasks in numerical linear algebra.
Advanced programming techniques in OpenACC, OpenMP Offloading, and HIP.
Code optimization: profiling methods and performance tuning for OpenACC, OpenMP Offloading, and HIP.

Accessing GPUs¶

You can access GPUs through a personal computer, a computing cluster or supercomputer, or cloud platforms (e.g., Google Cloud, Amazon Web Services). Instructions for each method are provided below.

Personal Computer:
- If you have a desktop or laptop with a compatible GPU, install the necessary compilers as described in the next section.
Cluster or Supercomputer:
- Login: Connect to the cluster or supercomputer using SSH. Cluster documentation typically provides the details.
- Load Modules: After logging in, load the required software environment modules (e.g., CUDA Toolkit, NVIDIA HPC Compiler) through Lmod, which helps manage software versions and dependencies.
Cloud Platform:
- Google Cloud: Google Cloud offers access to the latest GPU architectures with pricing based on usage. You can set up a virtual machine and follow the CUDA installation instructions.
- Amazon Web Services (AWS): AWS provides on-demand GPU instances. For more information, visit AWS customer support or refer to the AWS GPU instance documentation.

Compiler Requirements¶

The following compilers and toolkits are recommended for this course:

NVIDIA HPC SDK (for OpenACC and OpenMP Offloading):
- The NVIDIA HPC SDK includes compilers that support both OpenACC and OpenMP Offloading for GPUs. It is available for Linux and Windows. Installation guide
CUDA Toolkit (for CUDA and HIP Programming on NVIDIA GPUs):
- The CUDA Toolkit supports Linux, macOS, and Windows. Install it by following the instructions for each OS:
  - Linux
  - Windows
  - macOS
ROCm (for HIP Programming on AMD GPUs):
- ROCm is AMD’s open software platform supporting HIP for AMD GPUs. It is available for Linux; check the ROCm installation guide for setup details.

If you are unfamiliar with any of these compilers or toolkits, reviewing their documentation prior to the course is highly recommended.