Vector Addition: CPU vs. GPU Implementation¶

1. In CPU vector addition, how are elements of the vectors processed?

Click to reveal the answer

Answer: A. Sequentially, one by one

GPU Vector Addition with CUDA¶

2. Which CUDA function is used to allocate memory on the GPU?

Click to reveal the answer

Answer: C. `cudaMalloc`

3. What does cudaMemcpy(d_a, h_a, sizeof(float) * N, cudaMemcpyHostToDevice); do?

Click to reveal the answer

Answer: C. Copies data from CPU to GPU

4. Which formula calculates the required number of blocks per grid?

Click to reveal the answer

Answer: B. `(N + threadsPerBlock - 1) / threadsPerBlock`

5. True or False: Using dim3 threadsPerBlock(16, 16) configures a 2D block with 256 threads.

Click to reveal the answer

Answer: True

6. Why is cudaFree() used in GPU programs?

Click to reveal the answer

Answer: C. To deallocate device memory

7. What does blockIdx.x * blockDim.x + threadIdx.x calculate in CUDA?

Click to reveal the answer

Answer: A. The global thread index