如何在Cuda中选择Block和thread的值？

本文介绍了如何在Cuda中选择Block和thread的值？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是cuda的新朋友。我正在编写用于cuda中图像处理的代码。
我的c和cuda代码在下面，我试图转换为cuda，但效果不佳。

I m new with cuda. I m writing code for image processing in cuda.My c and cuda code is below and i tried to convert into cuda, but it not works well.

我的C代码：

void imageProcess_usingPoints(int point, unsigned short *img)
{
    // doing image process here using point variable value.
}

int main(int argc, char **argv)
{
 /* here i define and initialize some variable */

    int point=0;
    unsigned short *image_data;
     // consider that here i read image and store all pixels value in *image_data.

 for(int i=0;i<1050;i++,point+=1580)
 {


    // calling image process function like bluring image.
    imageProcess_usingPoints(point,image_data);
    /* doing some image process  using that point value on 16 bit grayscale image.*/
 } 


 return 0;
}

我试图将我的c代码转换为cuda，但这是错误的。
因此，我的cuda代码如下：

i tried to convert my c code into cuda, but its wrong. So, my cuda code that what ever i tried is below.

__global__ void processOnImage(int pointInc)
{
     int line = blockIdx.x * blockDim.x + threadIdx.x;
     int point=((line)*pointInc));
      /* here i m not getting exact vaue of point variable as same like in c code */
    /* doing image processing here using point value */

}


int main(int argc, char **argv)
{
 /* here i define and initialize some variable */

    int pointInc=1580;
    static const int BLOCK_WIDTH = 25;
    int x = static_cast<int>(ceilf(static_cast<float>(1050) / BLOCK_WIDTH));
    const dim3 grid (x,1);
    const dim3 block(BLOCK_WIDTH,1);
    processOnImage<<<grid,block>>>(pointInc);

 return 0;
}

在cuda代码的processOnImage函数中，im没有获得point（int point）的确切值）变量，如上面的c代码所示。所以我在cuda代码中做错了。或者如何在c中为我的代码使用该块和线程。

In processOnImage function of cuda code i m not getting exact value of point(int point) variable as in above c code. so what i m doing wrong in cuda code. Or how to use that block and thread for my code in c.

推荐答案

基本上，您可以将每个块的线程设置为 warpSize （或只是32的倍数）

Basically you could set threads per block to a multiple of warpSize (or just a multiple of 32)

对于大多数简单的内核，通常256是一个不错的选择。确切的数字必须调整。 CUDA安装目录中的此工具还可以帮助您选择编号。

Usually 256 is a good one for most simple kernels. The exact number has to be tuned. This tool in the CUDA installation dir can also help you choose the number.

$CUDA_HOME/tools/CUDA_Occupancy_Calculator.xls

确定每个块的线程号后，可以计算出数据大小所需的块号。以下示例显示了操作方法。

After determining the thread number per block, you could then calculated the block number required by your data size. The following example shows how to do that.

另一方面，对于任意数据大小，您也可以使用固定数量的块。有时您可以通过这种方式获得更高的性能。详情请参见此。

On the other hand, you could also use a fixed number of blocks for arbitrary data size. Sometimes you could get higher performance by this way. See this for more details.

这篇关于如何在Cuda中选择Block和thread的值？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！