让我们推出一个单独的 CUDA 线程来打招呼

Created: November-22, 2018

这个简单的 CUDA 程序演示了如何编写将在 GPU（又称设备）上执行的函数。CPU 或主机通过调用称为内核的特殊函数来创建 CUDA 线程。CUDA 程序是具有附加语法的 C++程序。

要查看它是如何工作的，请将以下代码放在名为 hello.cu 的文件中：

#include <stdio.h>

// __global__ functions, or "kernels", execute on the device
__global__ void hello_kernel(void)
{
  printf("Hello, world from the device!\n");
}

int main(void)
{
  // greet from the host
  printf("Hello, world from the host!\n");

  // launch a kernel with a single thread to greet from the device
  hello_kernel<<<1,1>>>();

  // wait for the device to finish so that we see the message
  cudaDeviceSynchronize();

  return 0;
}

（请注意，为了在设备上使用 printf 功能，你需要一台计算能力至少为 2.0 的设备。有关详细信息，请参阅版本概述。）

现在让我们使用 NVIDIA 编译器编译程序并运行它：

$ nvcc hello.cu -o hello
$ ./hello
Hello, world from the host!
Hello, world from the device!

有关上述示例的一些其他信息：

nvcc 代表“NVIDIA CUDA 编译器”。它将源代码分为主机和设备组件。
__global__ 是函数声明中使用的 CUDA 关键字，表示该函数在 GPU 设备上运行并从主机调用。
三角括号（<<<，>>>）标记从主机代码到设备代码（也称为内核启动）的调用。这三个括号中的数字表示并行执行的次数和线程数。