生成指数使用CUDA

生成指数使用CUDA

本文介绍了生成指数使用CUDA-C的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成设置为低于指数:

I am trying to generate set of indices below:

我有一个由20块CUDA块(blockIdx:0 -19)与分成4块,每块的块。(分块IDX:0,1,2和3)

I have a cuda block that consists of 20 blocks(blockIdx:from 0 -19) with each block subdivided into 4 blocks (sub block Idx: 0,1,2 and 3).

我想产生这样的指数模式:

I am trying to generate index pattern like this :

threadIdx(TID),SubBlockIdxA(SA),SubBlockIdxB(SB),BlockIdxA(BA),BlockIdxB(BB)

threadIdx (tid),SubBlockIdxA(SA),SubBlockIdxB(SB), BlockIdxA(BA),BlockIdxB(BB)

            Required                           Obtained
   tid  SBA SBB BA  BB    SBA  SBB BA  BB
    0   0   1   0   0       0   1   0   0
    1   1   0   0   1       1   0   0   1
    2   0   1   1   1       0   1   1   1
    3   1   0   1   2       1   0   1   2
    4   0   1   2   2       0   1   2   2
    5   1   0   2   3       1   0   2   3
    6   0   1   3   3       0   1   3   3
    7   1   0   3   4       1   0   3   4
    8   2   3   0   0       2   3   0   0
    9   3   2   0   1       3   2   0   1
    10  2   3   1   1       2   3   1   1
    11  3   2   1   2       3   2   1   2
    12  2   3   2   2       2   3   2   2
    13  3   2   2   3       3   2   2   3
    14  2   3   3   3       2   3   3   3
    15  3   2   3   4       3   2   3   4
    16  0   1   5   5       0   1   5   5
    17  1   0   5   6       1   0   5   6
    18  0   1   6   6       0   1   6   6
    19  1   0   6   7       1   0   6   7
    20  0   1   7   7       0   1   7   7
    21  1   0   7   8       1   0   7   8
    22  0   1   8   8       0   1   8   8
    23  1   0   8   9       1   0   8   9
    24  0   1   10  10      2   3   5   5
    25  1   0   10  11      3   2   5   6
    26  0   1   11  11      2   3   6   6
    27  1   0   11  12      3   2   6   7
    28  0   1   12  12      2   3   7   7
    29  1   0   12  13      3   2   7   8
    30  0   1   13  13      2   3   8   8
    31  1   0   13  14      3   2   8   9
    32  2   3   10  10      0   1   10  10
    33  3   2   10  11      1   0   10  11
    34  2   3   11  11      0   1   11  11
    35  3   2   11  12      1   0   11  12
    36  2   3   12  12      0   1   12  12
    37  3   2   12  13      1   0   12  13
    38  2   3   13  13      0   1   13  13
    39  3   2   13  14      1   0   13  14
    40  0   1   15  15      2   3   10  10
    41  1   0   15  16      3   2   10  11
    42  0   1   16  16      2   3   11  11
    43  1   0   16  17      3   2   11  12
    44  0   1   17  17      2   3   12  12
    45  1   0   17  18      3   2   12  13
    46  0   1   18  18      2   3   13  13
    47  1   0   18  19      3   2   13  14

请参见下面的我的code:

Please see my code below:

static __device__ void function()
    {
        uint16 uBlockIdxA, uBlockIdxB, uSubBlockIdxA, usubBlockIdxB;
        if threadIdx.x < 48)
        {
            uint16 uY = threadIdx.x / 8;
            uint16 uX = threadIdx.x - (uY * 8);
            uSubBlockIdxA = ((uY & 0x01) << 1)  + (uX & 0x01);
            uSubBlockIdxB = ((uY & 0x01) << 1)  + ((uX + 1) & 0x01);
            uBlockIdxB = (uY >> 1) * 5  + ((1 + uX) >> 1);
            uBlockIdxA = (uY >> 1) * 5  + ((0 + uX) >> 1);
            func (uBlockIdxA, uBlockIdxB, uSubBlockIdxA, uSubBlockIdxB);
        }
    }

我试图去思考逻辑来实现我所期待的。我不是正确的,但不知道我错过了什么。

I am trying to think the logic to achieve what i am looking for. I am not right but not sure what i am missing.

逻辑就如何产生,这将是有益的。 code是AP preciated。请帮助。

logic as to how to generate this will be helpful. Code is appreciated. Please help.

先谢谢了。

推荐答案

code生成指数是如下:

code to generate the indices is below:

拆分线程成半由于来自第一半的索引可用于下半场为10和小的调整偏移。

Split the threads into half since the indices from first half can be used for the second half with an offset of 10 and minor tweaks.

static __device__ void function()
{
    uint16 uBlockIdxA, uBlockIdxB, uSubBlockIdxA, usubBlockIdxB;

    if(threadIdx.x < 48)
    {
        uint16 uY = threadIdx.x / 8;
        uint16 uX = threadIdx.x - (uY * 8);

        if (threadIdx.x < 24)
        {

            uSubBlockIdxA = ((uY & 0x01) << 1)  + (uX & 0x01);
            uSubBlockIdxB = ((uY & 0x01) << 1)  + ((uX + 1) & 0x01);
            uBlockIdxB = (uY >> 1) * 5  + ((1 + uX) >> 1);
            uBlockIdxA = (uY >> 1) * 5  + ((0 + uX) >> 1);

        }
        else if
        {
            uSubBlockIdxA = (((uY - 3) & 0x01) << 1)  + (uX & 0x01);
            uSubBlockIdxB = (((uY - 3) & 0x01) << 1)  + ((uX + 1) & 0x01);
            uBlockIdxB = ((uY - 3) >> 1) * 5  + ((1 + uX) >> 1) + 10;
            uBlockIdxA = ((uY - 3) >> 1) * 5  + ((0 + uX) >> 1) + 10;
        }
    }
    func (uBlockIdxA, uBlockIdxB, uSubBlockIdxA, uSubBlockIdxB);
}

这篇关于生成指数使用CUDA-C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 04:14