Vector Addition: CPU vs. GPU Implementation¶
1. In CPU vector addition, how are elements of the vectors processed?
- A. Sequentially, one by one
- B. In parallel, using threads
- C. By blocks of elements
- D. Randomly
Click to reveal the answer
Answer: A. Sequentially, one by oneGPU Vector Addition with CUDA¶
2. Which CUDA function is used to allocate memory on the GPU?
- A.
malloc
- B.
free
- C.
cudaMalloc
- D.
cudaMemcpy
Click to reveal the answer
Answer: C. `cudaMalloc`3. What does cudaMemcpy(d_a, h_a, sizeof(float) * N, cudaMemcpyHostToDevice);
do?
- A. Copies data from GPU to CPU
- B. Allocates memory on the GPU
- C. Copies data from CPU to GPU
- D. Frees memory on the GPU
Click to reveal the answer
Answer: C. Copies data from CPU to GPUThread and Block Configuration¶
4. Which formula calculates the required number of blocks per grid?
- A.
N / threadsPerBlock
- B.
(N + threadsPerBlock - 1) / threadsPerBlock
- C.
N * threadsPerBlock
- D.
(N - threadsPerBlock) / threadsPerBlock
Click to reveal the answer
Answer: B. `(N + threadsPerBlock - 1) / threadsPerBlock`5. True or False: Using dim3 threadsPerBlock(16, 16)
configures a 2D block with 256 threads.
Click to reveal the answer
Answer: TrueMemory Management¶
6. Why is cudaFree()
used in GPU programs?
- A. To allocate device memory
- B. To transfer data from host to device
- C. To deallocate device memory
- D. To initialize device memory
Click to reveal the answer
Answer: C. To deallocate device memoryKernel Execution and Indexing¶
7. What does blockIdx.x * blockDim.x + threadIdx.x
calculate in CUDA?
- A. The global thread index
- B. The block size
- C. The grid size
- D. The shared memory size