c++ - 使用cuda线程使(rowIdx = 1…)工作

我在C ++中有这个

for ( rowIdx = 1; rowIdx < (NbRows - 1); rowIdx++ )

为了使用cuda做到这一点，我应该如何处理？

因为在cuda中，我们这样做：

if (rowIdx < ArraySize) ...

如果我在调用rowIdx=1之前设置了if (rowIdx < ArraySize)，则它不起作用。

----更新----------------------------

一个简单的例子说明。

__global__ void test_func(int *a_in,int *b_in,int *c_out)
{

    size_t rowIdx = blockIdx.x * blockDim.x + threadIdx.x;
    rowIdx=1;

    if (rowIdx <ARRAY_SIZE)
      c_out[rowIdx]=a_in[rowIdx]*b_in[rowIdx];


    }

//fill matrices
for (int i=0;i<ARRAY_SIZE;i++){

      a_in[i]=i;
      b_in[i]=i+1;
      c_out[i]=0;

     }

如果我使用rowIdx=1，那么我只正确地获取了第一个结果，其余为零。

最佳答案

为了用示例中提供的给定功能简单地替换for循环，内核可以采用这种方式。

__global__ void test_func(int *a_in,int *b_in,int *c_out)
{
    size_t rowIdx = blockIdx.x * blockDim.x + threadIdx.x;

    if (rowIdx > 0 &&       // ensure that rowIdx is at least 1
        rowIdx <ARRAY_SIZE) // ensure that rowIdx is not out of bounds
    {
      c_out[rowIdx]=a_in[rowIdx]*b_in[rowIdx];
    }
}

从索引1到ARRAY_SIZE-1，所有线程将计算不同的数组元素。
请注意，在这种情况下，不会计算“实际”第一个元素c_out[0]。