Hello World
Now our first exercise would be to print out the hello world from GPU. To do that, we need to do the following things:
- Run a part or entire application on the GPU
- Call cuda function on device
- It should be called using function qualifier
__global__ - Calling the device function on the main program:
- C/C++ example,
c_function() - CUDA example,
cuda_function<<<1,1>>>()(just using 1 thread) <<< >>>, specify the threads blocks within the bracket- Make sure to synchronize the threads
__syncthreads()synchronizes all the threads within a thread blockCudaDeviceSynchronize()synchronizes a kernel call in host- Most of the CUDA APIs are synchronized calls by default (but sometimes it is good to call explicit synchronized calls to avoid errors in the computation)
Questions and SolutionsΒΆ
Examples: Hello World
//-*-C++-*-
// Hello-world.cu
#include<studio.h>
#include<cuda.h>
// device function will be executed on device (GPU)
__global__ void cuda_function()
{
printf("Hello World from GPU!\n");
// synchronize all the threads
__syncthreads();
}
int main()
{
// call the kernel function
cuda_function<<<1,1>>>();
// synchronize the device kernel call
cudaDeviceSynchronize();
return 0;
}
Compilation and Output
Question
Right now, you are printing just one Hello World from GPU, but what if you would like to print more Hello World from GPU? How can you do that?