Matrix Multiplication: CPU vs. GPU Implementation¶

1. In CPU matrix multiplication, which element of the matrices is computed by each loop iteration?

A. A single row
B. A single column
C. A single element (dot product result)
D. The entire matrix

Click to reveal the answer

Answer: C. A single element (dot product result)

GPU Matrix Multiplication with CUDA¶

2. Which CUDA function is used to allocate memory for matrices on the GPU?

A. malloc
B. cudaMalloc
C. free
D. memcpy

Click to reveal the answer

Answer: B. `cudaMalloc`

3. Why do we use cudaMemcpy(d_a, a, sizeof(float) * (N * N), cudaMemcpyHostToDevice); in GPU programs?

A. To transfer data from GPU to CPU
B. To allocate memory on the GPU
C. To transfer data from CPU to GPU
D. To initialize device memory

Click to reveal the answer

Answer: C. To transfer data from CPU to GPU

Thread and Block Configuration¶

4. What is the purpose of defining dim3 dimBlock(32, 32, 1); for GPU matrix multiplication?

A. It sets the matrix dimensions
B. It defines the number of threads per block
C. It specifies the memory allocation size
D. It configures CPU threads

Click to reveal the answer

Answer: B. It defines the number of threads per block

5. True or False: The calculation dim3 dimGrid((N + blockSize - 1) / blockSize, (N + blockSize - 1) / blockSize, 1); ensures complete coverage of the matrix, even if N is not a multiple of blockSize.

Click to reveal the answer

Answer: True

Kernel Execution and Indexing¶

6. In the CUDA kernel matrix_mul, what does int row = blockIdx.x * blockDim.x + threadIdx.x; compute?

A. The matrix size
B. The global row index for each thread
C. The column size of the matrix
D. The shared memory size

Click to reveal the answer

Answer: B. The global row index for each thread

7. Why do we check if row < width && col < width in the CUDA kernel?

A. To initialize the matrix
B. To avoid accessing out-of-bounds memory
C. To synchronize threads
D. To allocate memory

Click to reveal the answer

Answer: B. To avoid accessing out-of-bounds memory

8. After executing a CUDA kernel, which function do we use to transfer results from the GPU back to the CPU?

A. cudaMemcpy(d_a, a, ..., cudaMemcpyDeviceToHost);
B. cudaFree
C. cudaMemcpy(d_c, c, ..., cudaMemcpyHostToDevice);
D. cudaMemcpy(d_c, c, ..., cudaMemcpyDeviceToHost);

Click to reveal the answer

Answer: D. `cudaMemcpy(d_c, c, ..., cudaMemcpyDeviceToHost);`