问题描述
我在使用cudaMemcpyToSymbol时遇到问题。我有一个工作正常的代码。我的代码的简化版本是:
I am having problems using cudaMemcpyToSymbol. I have a code that works just fine. A cutdown version of my code is this:
mykernel.h file:
__global__
void foo(float* out);
mykernel.cu file:
#include "kernels.h"
__global__
void foo(float* out)
{
uint32_t idx = blockIdx.x * blockDim.x + threadIdx.x;
out[idx] = 10;
}
main.cu file:
#include "kernels.h"
main()
{
// initialization and declaration stuff here
foo<<<1,1,1>>>(my_global_memory);
// read back global memory and investigate values
}
以上代码非常完美。现在,我想用一个来自恒定内存的值替换该 10值。因此,我要做的是:
The above code works just perfect. Now I want to replace this "10" value with a value coming from a constant memory. So what I did was to:
- 在Mykernel中添加
__ constant__ float my_const_var;
。 h文件。 - 在mykenel.cu 中用
- 加
浮点值= 10.0f; cudaMemcpyToSymbol(my_const_var,& value);
在我对main.cu进行调用之前
out [idx] = my_const_var;
替换内核的最后一行。 - add
__constant__ float my_const_var;
in mykernel.h file. - replace the last line of my kernel with
out[idx] = my_const_var;
in mykenel.cu - add
float value = 10.0f; cudaMemcpyToSymbol(my_const_var,&value);
before my invocation in main.cu
看起来cudaMemcpyToSymbol不会复制实际值,因为我得到的结果是 0而不是 10。此外,我总是检查CUDA错误,没有错误。有人可以告诉我我在做什么错吗?为什么cudaMemcpyToSymbol不将值复制到符号?我在Debian Linux和CUDA SDK 5.0上使用带有最新驱动程序的GeForce9600M(计算功能1.1)。我还尝试运行cuda-memcheck,但没有收到错误。
After having done all that it looks like cudaMemcpyToSymbol doesn't copy the actual value because I get a result of '0' instead of '10'. In addition, I always check for CUDA errors and there is none. Can someone tell me what am I doing wrong? And why cudaMemcpyToSymbol does not copy the value to the symbol? I am using a GeForce9600M (compute capability 1.1) with latest drivers on Debian Linux and CUDA SDK 5.0. I also tried running cuda-memcheck and I get no errors.
推荐答案
由于您正尝试在一个编译单元中访问变量在另一个编译单元( main.cu
和 mykernel.cu
)中定义的代码,则需要。
Since you are attempting to access a variable in one compilation unit that is defined in another compilation unit, (main.cu
and mykernel.cu
) this will require separate device compilation.
不幸的是,单独的编译仅适用于计算能力为2.0或更高的设备。
Unfortunately, separate compilation is only available for devices of compute capability 2.0 or greater.
-cc2.0,将必须引用给定变量的所有CUDA代码放在同一文件(声明该变量的文件)中。
You can work around this for pre-cc2.0 by putting all your CUDA code that must reference a given variable in the same file (the same file where the variable is declared).
这篇关于CUDA:cudaMemcpyToSymbol不复制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!