Matrix Multiplication: CPU vs. GPU Implementation¶
1. In CPU matrix multiplication, which element of the matrices is computed by each loop iteration?
- A. A single row
- B. A single column
- C. A single element (dot product result)
- D. The entire matrix
Click to reveal the answer
Answer: C. A single element (dot product result)GPU Matrix Multiplication with CUDA¶
2. Which CUDA function is used to allocate memory for matrices on the GPU?
- A.
malloc
- B.
cudaMalloc
- C.
free
- D.
memcpy
Click to reveal the answer
Answer: B. `cudaMalloc`3. Why do we use cudaMemcpy(d_a, a, sizeof(float) * (N * N), cudaMemcpyHostToDevice);
in GPU programs?
- A. To transfer data from GPU to CPU
- B. To allocate memory on the GPU
- C. To transfer data from CPU to GPU
- D. To initialize device memory
Click to reveal the answer
Answer: C. To transfer data from CPU to GPUThread and Block Configuration¶
4. What is the purpose of defining dim3 dimBlock(32, 32, 1);
for GPU matrix multiplication?
- A. It sets the matrix dimensions
- B. It defines the number of threads per block
- C. It specifies the memory allocation size
- D. It configures CPU threads
Click to reveal the answer
Answer: B. It defines the number of threads per block5. True or False: The calculation dim3 dimGrid((N + blockSize - 1) / blockSize, (N + blockSize - 1) / blockSize, 1);
ensures complete coverage of the matrix, even if N
is not a multiple of blockSize
.
Click to reveal the answer
Answer: TrueKernel Execution and Indexing¶
6. In the CUDA kernel matrix_mul
, what does int row = blockIdx.x * blockDim.x + threadIdx.x;
compute?
- A. The matrix size
- B. The global row index for each thread
- C. The column size of the matrix
- D. The shared memory size
Click to reveal the answer
Answer: B. The global row index for each thread7. Why do we check if row < width && col < width
in the CUDA kernel?
- A. To initialize the matrix
- B. To avoid accessing out-of-bounds memory
- C. To synchronize threads
- D. To allocate memory
Click to reveal the answer
Answer: B. To avoid accessing out-of-bounds memory8. After executing a CUDA kernel, which function do we use to transfer results from the GPU back to the CPU?
- A.
cudaMemcpy(d_a, a, ..., cudaMemcpyDeviceToHost);
- B.
cudaFree
- C.
cudaMemcpy(d_c, c, ..., cudaMemcpyHostToDevice);
- D.
cudaMemcpy(d_c, c, ..., cudaMemcpyDeviceToHost);