问题描述
假设您在CUDA内核中声明了一个新变量,然后在多个线程中使用它,例如:
Say you declare a new variable in a CUDA kernel and then use it in multiple threads, like:
__global__ void kernel(float* delt, float* deltb) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
float a;
a = delt[i] + deltb[i];
a += 1;
}
内核调用如下所示,具有多个线程和块: / p>
and the kernel call looks something like below, with multiple threads and blocks:
int threads = 200;
uint3 blocks = make_uint3(200,1,1);
kernel<<<blocks,threads>>>(d_delt, d_deltb);
- a是否存储在堆栈中?
- 初始化时是否为每个线程创建一个新的 a?
- 还是每个线程都将在未知时间独立访问 a,可能会弄乱线程
推荐答案
以上都不是。 CUDA编译器具有足够的智能和侵略性,并进行了优化,可以检测到未使用 a
并且可以将完整的代码进行优化。您可以通过编译内核来确认这一点。 -Xptxas = -v
作为选项,并查看资源计数,该资源计数基本上应该没有寄存器,也没有本地内存或堆。
None of the above. The CUDA compiler is smart enough and aggressive enough with optimisations that it can detect that a
is unused and the complete code can be optimised away.You can confirm this by compiling the kernel with -Xptxas=-v
as an option and look at the resource count, which should be basically no registers and no local memory or heap.
在一个不那么平凡的示例中, a
可能存储在每个线程寄存器或每个线程本地内存中,该内存是独立的DRAM。
In a less trivial example, a
would probably be stored in a per thread register, or in per thread local memory, which is off-die DRAM.
这篇关于在CUDA内核中声明变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!