atomicAdd示例无法产生正确的输出

atomicAdd示例无法产生正确的输出

本文介绍了cuda atomicAdd示例无法产生正确的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编写以下代码的目的是使100个元素的float数组增加1十倍.在输出中,我期望每个元素都有一个10.0f值的100元素数组.相反,我得到随机值.您能在这里指出我的错误吗?

The following code was written with the goal of incrementing a 100 element array of floats by 1 ten times. In the output, I was expecting a 100 element array of 10.0f value for each element. Instead, I get random values. Can you please point out my error here?

__global__  void testAdd(float *a)
{
    float temp;
    for (int i = 0; i < 100 ; i++)
    {
        a[i] = atomicAdd(&a[i], 1.0f);
    }
}
void cuTestAtomicAdd(float *a)
{
    testAdd<<<1, 10>>>(a);
}

我的目标是了解原子操作的原理,以便将其应用于其他地方.

My goal is to understand the workings of atomic operations, so as to apply them elsewhere.

推荐答案

这不是我们执行 atomicAdd 操作的方式.

That's not how we do an atomicAdd operation.

只需这样做:

atomicAdd(&a[i], 1.0f);

,相关的变量( a [i] )将被更新.

and the variable in question (a[i]) will be updated.

原子函数的返回值通常是原子更新中之前变量中的 old 值.

The return value from an atomic function is generally the old value that was in the variable, before the atomic update.

这样做:

a[i] = atomicAdd(&a[i], 1.0f);

将更新变量 a [i] ,然后(以非原子方式)将 old 值分配给变量 a [i] .几乎可以肯定这不是您想要的.

will update the variable a[i], and then (non-atomically) assign the old value to the variable a[i]. That's almost certainly not what you want.

阅读文档:

以下完整的代码演示了正确的用法:

The following complete code demonstrates correct usage:

#include <iostream>

__global__  void testAdd(float *a)
{
    for (int i = 0; i < 100 ; i++)
    {
        atomicAdd(&a[i], 1.0f);
    }
}
void cuTestAtomicAdd(float *a)
{
    testAdd<<<1, 10>>>(a);
}

int main(){

  float *d_data, *h_data;
  h_data=(float *) malloc(100*sizeof(float));
  cudaMalloc((void **)&d_data, 100*sizeof(float));
  cudaMemset(d_data, 0, 100*sizeof(float));
  cuTestAtomicAdd(d_data);
  cudaMemcpy(h_data, d_data, 100*sizeof(float), cudaMemcpyDeviceToHost);
  for (int i = 0; i < 100; i++)
    if (h_data[i] != 10.0f) {printf("mismatch at %d, was %f, should be %f\n", i, h_data[i], 10.0f); return 1;}
  printf("Success\n");
  return 0;
}

这篇关于cuda atomicAdd示例无法产生正确的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 04:23