本文介绍了Cuda奇怪的bug的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有麻烦理解我在一个简单的Cuda内核的错误。我缩小了我的内核到最小,仍然显示错误。

I am having troubles understanding a bug that I have in a simple Cuda kernel. I shrinked down my kernel to the minimum that still shows the error.

我有一个多边形类,只存储一些点。我有一个函数添加一点(只是增加计数器),我添加4点到我的多边形数组中的所有多边形。最后,我调用一个使用循环更新点数的函数。如果在这个循环中,我调用 new_nbpts ++ 一次,我获得了预期的答案:所有多边形有4个点。如果在同一个循环中,我再次调用 new_nbpts ++ ,那么我的多边形有一个垃圾点数(4194304点),这是不正确的(我应该得到8)。

I have a "Polygon" class that just stores a number of points. I have a function that "adds a point" (just increments the counter), and I add 4 points to all polygons in my array of polygons. Finally, I call a function that updates the number of points using a loop. If, in this loop, I call new_nbpts++ once, I obtain the expected answer : all polygons have 4 points. If in the same loop I call new_nbpts++ a second time, then my polygons have a garbage number of points (4194304 points) which is not correct (I should get 8).

我希望有一些我误解的东西。

I expect there is something I misunderstood though.

完成内核:

#include <stdio.h>
#include <cuda.h>


class Polygon {
public:
  __device__ Polygon():nbpts(0){};
  __device__ void addPt() {
    nbpts++;
  }; 
  __device__ void update() {
    int new_nbpts = 0;
    for (int i=0; i<nbpts; i++) {
        new_nbpts++;
        new_nbpts++;  // calling that a second time screws up my result
    }
    nbpts = new_nbpts;
  }

 int nbpts;
};


__global__ void cut_poly(Polygon* polygons, int N)
{
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if (idx>=N) return;

  Polygon pol;
  pol.addPt();
  pol.addPt();
  pol.addPt();
  pol.addPt();

  for (int i=0; i<N; i++) {
    pol.update();
  }

  polygons[idx] = pol;
}



int main(int argc, unsigned char* argv[])
{
  const int N = 20; 
  Polygon p_h[N], *p_d;

  cudaError_t err = cudaMalloc((void **) &p_d, N * sizeof(Polygon));   

  int block_size = 4;
  int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
  cut_poly <<< n_blocks, block_size >>> (p_d, N);

  cudaMemcpy(p_h, p_d, sizeof(Polygon)*N, cudaMemcpyDeviceToHost);

  for (int i=0; i<N; i++)
   printf("%d\n", p_h[i].nbpts);

  cudaFree(p_d);

  return 0;
}


推荐答案

您的内核结束:

  for (int i=0; i<N; i++) {
    pol.update();
  }

记住每个线程都有自己的实例:

Remember each thread has it's own instance of:

多边形pol;

如果你想更新每个线程的实例在内核的末尾,你只需要做:

If you want to update each thread's instance of pol at the end of the kernel, you only need to do:

pol.update();

现在,您的情况会发生什么?

Now, what happens in your case?

假设你的update()代码只有一个:

Suppose your update() code only has one:

new_nbpts++; 

在每次迭代时将0到N-1调用pol.update():

Your for loop of 0 to N-1 calling pol.update() will, on each iteration:


  1. 将new_nbpts设置为零

  2. 增加new_nbpts总共nbpts次。

  3. 将nbpts的值替换为new_nbpts


$ b $希望你能看到这有使nbpts保持不变的效果。
即使在调用pol.update()的for循环的N次迭代之后,nbpts的值也不会改变。

Hopefully you can see this has the effect of leaving nbpts unchanged.Even after N iterations of the for loop that is calling pol.update(), the value of nbpts is unchanged.

new_nbpts++;
new_nbpts++;

然后在每次调用pol.update()时,我将:

in my update() method? Then on each call of pol.update(), I will:


  1. 将new_nbpts设置为零

  2. 将new_nbpts增加两个总共nbpts次

  3. 将nbpts的值替换为新的nbpts

希望你可以看到每次调用pol.update()时都会使nbpts 倍增的效果

Hopefully you can see this has the effect of doubling nbpts on each call of pol.update()

现在, pol.update()在每个线程中N次,您将nbpts的起始值加倍N次,即nbpts * 2 ^ N。因为nbpts开始(在这种情况下)为4,我们有4 * 2 ^ 20 = 4194304

Now, since you are calling pol.update() N times in each thread, you are doubling the starting value of nbpts N times, i.e. nbpts *2^N. Since nbpts starts out (in this case) as 4, we have 4*2^20=4194304

我不知道你在做什么,但我的猜测是你在运行那个for循环在内核的想法,你将更新所有不同的Polygon pol的实例,这种方式。但这不是怎么做的,你只需要一个

I'm not really sure what you're after with all this, but my guess is you were running that for loop at the end of the kernel thinking you were going to update all the different instances of Polygon pol that way. But that's not how to do it, and all you need is a single

pol.update();

在内核的末尾,如果这是你的意图。

at the end of the kernel, if that was your intention.

这篇关于Cuda奇怪的bug的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 19:07