Skip to content

Vector Addition: CPU vs. GPU Implementation

1. In CPU vector addition, how are elements of the vectors processed?

  • A. Sequentially, one by one
  • B. In parallel, using threads
  • C. By blocks of elements
  • D. Randomly
Click to reveal the answer Answer: A. Sequentially, one by one

GPU Vector Addition with CUDA

2. Which CUDA function is used to allocate memory on the GPU?

  • A. malloc
  • B. free
  • C. cudaMalloc
  • D. cudaMemcpy
Click to reveal the answer Answer: C. `cudaMalloc`

3. What does cudaMemcpy(d_a, h_a, sizeof(float) * N, cudaMemcpyHostToDevice); do?

  • A. Copies data from GPU to CPU
  • B. Allocates memory on the GPU
  • C. Copies data from CPU to GPU
  • D. Frees memory on the GPU
Click to reveal the answer Answer: C. Copies data from CPU to GPU

Thread and Block Configuration

4. Which formula calculates the required number of blocks per grid?

  • A. N / threadsPerBlock
  • B. (N + threadsPerBlock - 1) / threadsPerBlock
  • C. N * threadsPerBlock
  • D. (N - threadsPerBlock) / threadsPerBlock
Click to reveal the answer Answer: B. `(N + threadsPerBlock - 1) / threadsPerBlock`

5. True or False: Using dim3 threadsPerBlock(16, 16) configures a 2D block with 256 threads.

Click to reveal the answer Answer: True

Memory Management

6. Why is cudaFree() used in GPU programs?

  • A. To allocate device memory
  • B. To transfer data from host to device
  • C. To deallocate device memory
  • D. To initialize device memory
Click to reveal the answer Answer: C. To deallocate device memory

Kernel Execution and Indexing

7. What does blockIdx.x * blockDim.x + threadIdx.x calculate in CUDA?

  • A. The global thread index
  • B. The block size
  • C. The grid size
  • D. The shared memory size
Click to reveal the answer Answer: A. The global thread index