Skip to content

Matrix Multiplication: CPU vs. GPU Implementation

1. In CPU matrix multiplication, which element of the matrices is computed by each loop iteration?

  • A. A single row
  • B. A single column
  • C. A single element (dot product result)
  • D. The entire matrix
Click to reveal the answer Answer: C. A single element (dot product result)

GPU Matrix Multiplication with CUDA

2. Which CUDA function is used to allocate memory for matrices on the GPU?

  • A. malloc
  • B. cudaMalloc
  • C. free
  • D. memcpy
Click to reveal the answer Answer: B. `cudaMalloc`

3. Why do we use cudaMemcpy(d_a, a, sizeof(float) * (N * N), cudaMemcpyHostToDevice); in GPU programs?

  • A. To transfer data from GPU to CPU
  • B. To allocate memory on the GPU
  • C. To transfer data from CPU to GPU
  • D. To initialize device memory
Click to reveal the answer Answer: C. To transfer data from CPU to GPU

Thread and Block Configuration

4. What is the purpose of defining dim3 dimBlock(32, 32, 1); for GPU matrix multiplication?

  • A. It sets the matrix dimensions
  • B. It defines the number of threads per block
  • C. It specifies the memory allocation size
  • D. It configures CPU threads
Click to reveal the answer Answer: B. It defines the number of threads per block

5. True or False: The calculation dim3 dimGrid((N + blockSize - 1) / blockSize, (N + blockSize - 1) / blockSize, 1); ensures complete coverage of the matrix, even if N is not a multiple of blockSize.

Click to reveal the answer Answer: True

Kernel Execution and Indexing

6. In the CUDA kernel matrix_mul, what does int row = blockIdx.x * blockDim.x + threadIdx.x; compute?

  • A. The matrix size
  • B. The global row index for each thread
  • C. The column size of the matrix
  • D. The shared memory size
Click to reveal the answer Answer: B. The global row index for each thread

7. Why do we check if row < width && col < width in the CUDA kernel?

  • A. To initialize the matrix
  • B. To avoid accessing out-of-bounds memory
  • C. To synchronize threads
  • D. To allocate memory
Click to reveal the answer Answer: B. To avoid accessing out-of-bounds memory

8. After executing a CUDA kernel, which function do we use to transfer results from the GPU back to the CPU?

  • A. cudaMemcpy(d_a, a, ..., cudaMemcpyDeviceToHost);
  • B. cudaFree
  • C. cudaMemcpy(d_c, c, ..., cudaMemcpyHostToDevice);
  • D. cudaMemcpy(d_c, c, ..., cudaMemcpyDeviceToHost);
Click to reveal the answer Answer: D. `cudaMemcpy(d_c, c, ..., cudaMemcpyDeviceToHost);`