问题描述
谁能帮我理解为什么以下code使分段错误?
同样,谁能帮助我理解为什么换出标有坏的标记为好两线两线不分割故障导致?
注意,赛格故障似乎在cudaMalloc线出现;如果我评论说出来我也看不出分割故障。这些分配似乎是相互加强的,但我不明白怎么。
在code的意图是建立三个结构:
h_P主机,它将由一个CPU常规填充上
D_P的装置中,这将通过一个GPU例程来填充上
h_P_copy主机,这将通过复制GPU的数据结构早在填充上。
这样我可以验证正确的行为和基准的VS其他。结果
所有这些的确都是四维阵列。
(如果它的事项,有问题的卡是GTX 580,使用NVCC 4.2下的SUSE Linux)
的#define NUM_STATES 32
#定义NUM_MEMORY 16INT主(INT ARGC,字符** argv的){ //分配和创造xp的矩阵
INT P_size = sizeof的(浮点)* NUM_STATES * NUM_STATES * NUM_MEMORY * NUM_MEMORY;
//浮动* h_P =(浮点*)malloc的(P_size); **好**
//浮动* h_P_copy =(浮点*)malloc的(P_size); **好**
浮h_P [P_size] // ** **不好
浮h_P_copy [P_size] // ** **不好
浮* D_P;
cudaMalloc((无效**)及D_P,P_size);
cudaMemset(D_P,0.0,P_size);}
这可能是由于栈某种腐败。
注:
- 好行分配了系统堆,坏的行
分配栈存储。 - 通常你可以从堆栈中分配量颇有几分
比你可以从堆中分配的少。 - 的相同数量的好与坏的声明不保留
的浮动
存储。 坏的分配4倍之多浮动
存储。 - 最后,
cudaMemset
,就像memset的
,是设置的字节的和
期望一个无符号的字符数量,而不是浮动(0.0)的数量。
由于 cudaMalloc
行是第一个实际使用(尝试设置)任何在坏的情况下,分配的堆栈存储器,它是在哪里发生赛格故障。如果你增加了一个附加声明,像这样:
浮动* D_P;
浮设为myVal; //加
=设为myVal 0.0; // ADD2
cudaMalloc((无效**)及D_P,P_size);
我怀疑你可能会看到赛格故障发生在ADD2行,因为这则是第一个利用损坏的堆栈的存储空间。
Can anyone help me to understand why the following code causes a segmentation fault?Likewise, can anyone help me understand why swapping out the two lines labelled "bad" for the two lines labelled "good" does not result in a segmentation fault?
Note that the seg fault seems to occur at the cudaMalloc line; if I comment that out I also do not see a segmentation fault. These allocations seem to be stepping on each other, but I don't understand how.
The intent of the code is to set up three structures:h_P on the host, which will be populated by a CPU routined_P on the device, which will be populated by a GPU routineh_P_copy on the host, which will be populated by copying the GPU data structure back in.
That way I can verify correct behavior and benchmark one vs the other.
All of those are, indeed, four-dimensional arrays.
(If it matters, the card in question is a GTX 580, using nvcc 4.2 under SUSE Linux)
#define NUM_STATES 32
#define NUM_MEMORY 16
int main( int argc, char** argv) {
// allocate and create P matrix
int P_size = sizeof(float) * NUM_STATES * NUM_STATES * NUM_MEMORY * NUM_MEMORY;
// float *h_P = (float*) malloc (P_size); **good**
// float *h_P_copy = (float*) malloc (P_size); **good**
float h_P[P_size]; // **bad**
float h_P_copy[P_size]; // **bad**
float *d_P;
cudaMalloc( (void**) &d_P, P_size);
cudaMemset( d_P, 0.0, P_size);
}
This is likely due to stack corruption of some sort.
Notes:
- The "good" lines allocate out of the system heap, the "bad" linesallocate stack storage.
- Normally the amount you can allocate from the stack is quite a bitsmaller than what you can allocate from the heap.
- The "good" and "bad" declarations are not reserving the same amountof
float
storage. The "bad" are allocating 4x as muchfloat
storage. - Finally,
cudaMemset
, just likememset
, is setting bytes andexpects a unsigned char quantity, not a float (0.0) quantity.
Since the cudaMalloc
line is the first one that actually "uses" (attempts to set) any of the allocated stack storage in the "bad" case, it is where the seg fault occurs. If you added an additional declaration like so:
float *d_P;
float myval; //add
myval = 0.0f; //add2
cudaMalloc( (void**) &d_P, P_size);
I suspect you might see the seg fault occur on the "add2" line, as it would then be the first to make use of the corrupted stack storage.
这篇关于神秘的赛格故障与Cudamalloc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!