讓我們推出一個單獨的 CUDA 執行緒來打招呼

Created: November-22, 2018

這個簡單的 CUDA 程式演示瞭如何編寫將在 GPU（又稱裝置）上執行的函式。CPU 或主機通過呼叫稱為核心的特殊函式來建立 CUDA 執行緒。CUDA 程式是具有附加語法的 C++程式。

要檢視它是如何工作的，請將以下程式碼放在名為 hello.cu 的檔案中：

#include <stdio.h>

// __global__ functions, or "kernels", execute on the device
__global__ void hello_kernel(void)
{
  printf("Hello, world from the device!\n");
}

int main(void)
{
  // greet from the host
  printf("Hello, world from the host!\n");

  // launch a kernel with a single thread to greet from the device
  hello_kernel<<<1,1>>>();

  // wait for the device to finish so that we see the message
  cudaDeviceSynchronize();

  return 0;
}

（請注意，為了在裝置上使用 printf 功能，你需要一臺計算能力至少為 2.0 的裝置。有關詳細資訊，請參閱版本概述。）

現在讓我們使用 NVIDIA 編譯器編譯程式並執行它：

$ nvcc hello.cu -o hello
$ ./hello
Hello, world from the host!
Hello, world from the device!

有關上述示例的一些其他資訊：

nvcc 代表“NVIDIA CUDA 編譯器”。它將原始碼分為主機和裝置元件。
__global__ 是函式宣告中使用的 CUDA 關鍵字，表示該函式在 GPU 裝置上執行並從主機呼叫。
三角括號（<<<，>>>）標記從主機程式碼到裝置程式碼（也稱為核心啟動）的呼叫。這三個括號中的數字表示並行執行的次數和執行緒數。