Course Organization and GPU Access¶
To follow this course, learners are expected to have a basic understanding of C/C++ and Fortran programming. This course will cover OpenACC, OpenMP Offloading, and HIP, focusing on GPU programming techniques for scientific and engineering applications.
- C/C++ (Required)
- C/C++ provides low-level control over memory and system resources, making it ideal for performance-critical applications. It is the foundational language for CUDA and HIP programming.
- Fortran (Optional)
- Fortran remains widely used in scientific computing, particularly in numerical simulations (e.g., climate modeling), due to its efficiency and legacy in scientific applications.
- OpenMP (Optional)
- OpenMP is an API for parallel programming on multicore CPUs. This course will introduce OpenMP Offloading for parallelizing code on CPUs and GPUs.
Below is an overview of the programming models and their target architectures:
Model | Implementation | Supported Languages | Target Architectures |
---|---|---|---|
OpenACC | Directives | Fortran, C, C++ | CPUs, GPUs, OpenPOWER |
OpenMP Offloading | Directives | Fortran, C, C++ | CPUs, GPUs, Xeon Phi |
HIP | Language extension | C, C++ | AMD and NVIDIA GPUs |
CUDA | Language extension | C, C++ (Fortran) | NVIDIA GPUs |
OpenCL | Language extension | C, C++ | GPUs, CPUs, FPGAs |
C++ AMP | Language extension | C++ | CPUs, GPUs |
RAJA | C++ abstraction | C++ | CPUs, GPUs |
TBB | C++ abstraction | C++ | CPUs |
C++17 | Language feature | C++ | CPUs |
Fortran 2008 | Language feature | Fortran | CPUs |
Course Structure¶
- Duration: 5 weeks
- Topics: The course is divided into four main parts:
- OpenMP Offloading: Covers OpenMP for parallel programming across CPUs and accelerators (GPUs), from basics to advanced topics.
- OpenACC Programming: Provides an in-depth guide to the OpenACC model for portable GPU programming.
- HIP Programming: Teaches HIP for GPU programming on AMD and NVIDIA platforms, allowing a more versatile approach to GPGPU.
- Profiling and Performance Optimization: Focuses on profiling tools and techniques to optimize GPU code for improved performance.
- Topics Breakdown:
- Each topic would have between 6 and 8 sections, and each section has one article, quiz, and discussion.
- Each section would highlight and summarize the importance of that concept through videos.
What You Will Learn¶
- GPU architecture (NVIDIA and AMD): covering memory hierarchy, streaming multiprocessors, TPUs, and more.
- Parallel programming using OpenACC, OpenMP Offloading, and HIP, including thread organization, OpenACC directives, and OpenMP clauses.
- Applying GPU programming to computational tasks in numerical linear algebra.
- Advanced programming techniques in OpenACC, OpenMP Offloading, and HIP.
- Code optimization: profiling methods and performance tuning for OpenACC, OpenMP Offloading, and HIP.
Accessing GPUs¶
You can access GPUs through a personal computer, a computing cluster or supercomputer, or cloud platforms (e.g., Google Cloud, Amazon Web Services). Instructions for each method are provided below.
-
Personal Computer:
- If you have a desktop or laptop with a compatible GPU, install the necessary compilers as described in the next section.
-
Cluster or Supercomputer:
- Login: Connect to the cluster or supercomputer using SSH. Cluster documentation typically provides the details.
- Load Modules: After logging in, load the required software environment modules (e.g., CUDA Toolkit, NVIDIA HPC Compiler) through Lmod, which helps manage software versions and dependencies.
-
Cloud Platform:
- Google Cloud: Google Cloud offers access to the latest GPU architectures with pricing based on usage. You can set up a virtual machine and follow the CUDA installation instructions.
- Amazon Web Services (AWS): AWS provides on-demand GPU instances. For more information, visit AWS customer support or refer to the AWS GPU instance documentation.
Compiler Requirements¶
The following compilers and toolkits are recommended for this course:
-
NVIDIA HPC SDK (for OpenACC and OpenMP Offloading):
- The NVIDIA HPC SDK includes compilers that support both OpenACC and OpenMP Offloading for GPUs. It is available for Linux and Windows. Installation guide
-
CUDA Toolkit (for CUDA and HIP Programming on NVIDIA GPUs):
-
ROCm (for HIP Programming on AMD GPUs):
- ROCm is AMD’s open software platform supporting HIP for AMD GPUs. It is available for Linux; check the ROCm installation guide for setup details.
If you are unfamiliar with any of these compilers or toolkits, reviewing their documentation prior to the course is highly recommended.