在CUDA内核中声明变量

本文介绍了在CUDA内核中声明变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设您在CUDA内核中声明了一个新变量，然后在多个线程中使用它，例如：

Say you declare a new variable in a CUDA kernel and then use it in multiple threads, like:

__global__ void kernel(float* delt, float* deltb) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
float a;
a = delt[i] + deltb[i];
a += 1;
}

内核调用如下所示，具有多个线程和块： / p>

and the kernel call looks something like below, with multiple threads and blocks:

int threads = 200;
uint3 blocks = make_uint3(200,1,1);
kernel<<<blocks,threads>>>(d_delt, d_deltb);

a是否存储在堆栈中？

初始化时是否为每个线程创建一个新的 a？

还是每个线程都将在未知时间独立访问 a，可能会弄乱线程

推荐答案

以上都不是。 CUDA编译器具有足够的智能和侵略性，并进行了优化，可以检测到未使用 a 并且可以将完整的代码进行优化。您可以通过编译内核来确认这一点。 -Xptxas = -v 作为选项，并查看资源计数，该资源计数基本上应该没有寄存器，也没有本地内存或堆。

None of the above. The CUDA compiler is smart enough and aggressive enough with optimisations that it can detect that a is unused and the complete code can be optimised away.You can confirm this by compiling the kernel with -Xptxas=-v as an option and look at the resource count, which should be basically no registers and no local memory or heap.

在一个不那么平凡的示例中， a 可能存储在每个线程寄存器或每个线程本地内存中，该内存是独立的DRAM。

In a less trivial example, a would probably be stored in a per thread register, or in per thread local memory, which is off-die DRAM.

这篇关于在CUDA内核中声明变量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！