问题描述
编写以下代码的目的是使100个元素的float数组增加1十倍.在输出中,我期望每个元素都有一个10.0f值的100元素数组.相反,我得到随机值.您能在这里指出我的错误吗?
The following code was written with the goal of incrementing a 100 element array of floats by 1 ten times. In the output, I was expecting a 100 element array of 10.0f value for each element. Instead, I get random values. Can you please point out my error here?
__global__ void testAdd(float *a)
{
float temp;
for (int i = 0; i < 100 ; i++)
{
a[i] = atomicAdd(&a[i], 1.0f);
}
}
void cuTestAtomicAdd(float *a)
{
testAdd<<<1, 10>>>(a);
}
我的目标是了解原子操作的原理,以便将其应用于其他地方.
My goal is to understand the workings of atomic operations, so as to apply them elsewhere.
推荐答案
这不是我们执行 atomicAdd
操作的方式.
That's not how we do an atomicAdd
operation.
只需这样做:
atomicAdd(&a[i], 1.0f);
,相关的变量( a [i]
)将被更新.
and the variable in question (a[i]
) will be updated.
原子函数的返回值通常是原子更新中之前变量中的 old 值.
The return value from an atomic function is generally the old value that was in the variable, before the atomic update.
这样做:
a[i] = atomicAdd(&a[i], 1.0f);
将更新变量 a [i]
,然后(以非原子方式)将 old 值分配给变量 a [i]
.几乎可以肯定这不是您想要的.
will update the variable a[i]
, and then (non-atomically) assign the old value to the variable a[i]
. That's almost certainly not what you want.
阅读文档:
以下完整的代码演示了正确的用法:
The following complete code demonstrates correct usage:
#include <iostream>
__global__ void testAdd(float *a)
{
for (int i = 0; i < 100 ; i++)
{
atomicAdd(&a[i], 1.0f);
}
}
void cuTestAtomicAdd(float *a)
{
testAdd<<<1, 10>>>(a);
}
int main(){
float *d_data, *h_data;
h_data=(float *) malloc(100*sizeof(float));
cudaMalloc((void **)&d_data, 100*sizeof(float));
cudaMemset(d_data, 0, 100*sizeof(float));
cuTestAtomicAdd(d_data);
cudaMemcpy(h_data, d_data, 100*sizeof(float), cudaMemcpyDeviceToHost);
for (int i = 0; i < 100; i++)
if (h_data[i] != 10.0f) {printf("mismatch at %d, was %f, should be %f\n", i, h_data[i], 10.0f); return 1;}
printf("Success\n");
return 0;
}
这篇关于cuda atomicAdd示例无法产生正确的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!