Matrix Multiplication with OpenMP Offloading Quiz¶

1. Which OpenMP directive should be used to offload matrix multiplication to the GPU while collapsing both outer loops (row and col)?

A. #pragma omp target teams collapse(2)
B. #pragma omp target parallel for collapse(2)
C. #pragma omp target loop collapse(2)
D. #pragma omp target parallel collapse(2)

Click to reveal the answer

Answer: B. `#pragma omp target parallel for collapse(2)`

2. In matrix multiplication with #pragma omp target teams distribute parallel for, what is the main advantage of using teams?

A. It specifies that the code should run on the CPU.
B. It creates multiple teams on the GPU, each responsible for a portion of the work, improving workload distribution.
C. It forces all computations to be sequential.
D. It prevents data from being transferred to the GPU.

Click to reveal the answer

Answer: B. It creates multiple teams on the GPU, each responsible for a portion of the work, improving workload distribution.

3. True or False: The collapse(2) clause is used to merge two loops (e.g., row and col loops) so they can be executed in parallel as a single loop.

Click to reveal the answer

Answer: True

4. Which clause would you use to ensure each thread has a private copy of row, col, and i within a parallelized matrix multiplication loop?

A. collapse
B. num_teams
C. private(row, col, i)
D. map(tofrom: row, col, i)

Click to reveal the answer

Answer: C. `private(row, col, i)`

5. In the option #pragma omp target teams distribute parallel for num_teams(5) collapse(2), what does num_teams(5) do?

A. Specifies that each team should have 5 threads.
B. Creates exactly 5 teams to distribute the work on the GPU.
C. Allocates 5 memory spaces on the GPU.
D. Limits the number of iterations each thread can execute to 5.

Click to reveal the answer

Answer: B. Creates exactly 5 teams to distribute the work on the GPU.

6. In the C/C++ example, what does the map(to: a[0:n*n], b[0:n*n]) map(from: c[0:n*n]) clause do?

A. Copies a and b from the device to the host and c from the host to the device.
B. Allocates memory for a, b, and c on the device but does not initialize them.
C. Copies a and b to the device and brings c back from the device to the host after computation.
D. Allocates memory for c on the host but initializes a and b on the device.

Click to reveal the answer

Answer: C. Copies `a` and `b` to the device and brings `c` back from the device to the host after computation.

7. Which of the following is a benefit of using teams distribute parallel for over parallel for alone in GPU offloading?

A. It prevents memory access conflicts by limiting memory access.
B. It supports multi-level parallelism by creating teams and allowing parallel execution within each team.
C. It enables execution on the CPU instead of the GPU.
D. It disables the need for memory mapping.

Click to reveal the answer

Answer: B. It supports multi-level parallelism by creating teams and allowing parallel execution within each team.

8. What is the purpose of collapse(2) in #pragma omp target teams distribute parallel for collapse(2) when performing matrix multiplication?

A. It divides the workload into two halves.
B. It forces two copies of each variable to be created.
C. It merges the row and col loops to treat them as a single loop, improving parallelism.
D. It prevents any data races by creating private copies of variables.

Click to reveal the answer

Answer: C. It merges the row and col loops to treat them as a single loop, improving parallelism.

9. True or False: #pragma omp target teams distribute parallel for can create both teams and threads to handle a distributed workload on the GPU.

Click to reveal the answer

Answer: True

10. Which construct would you use if you want to control the number of teams created on the GPU during matrix multiplication?

A. collapse(2)
B. private
C. num_teams(5)
D. map(to: a[0:n*n], b[0:n*n])

Click to reveal the answer

Answer: C. `num_teams(5)`