本文介绍了Cuda,执行线程在3d块中的顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为标题,我想知道正确的执行顺序,如果我们有一个3d块



我想记住,我已经读了一些关于它,这是一段时间前,我不记得在哪里,但它是由一个没有看到这么可靠的人来的。



无论如何我想要一些确认。 / p>

是否如下所示(按经度划分)?



[0,0,0] [blockDim.x,0,0] - [0,1,0] ... [blockDim.x,1,0] - (...) - [0,blockDim.y,0] ... [blockDim .x,blockDim.y,0] - [0,0,1] ... [blockDim.x,0,1] - (...) - [0,blockDim.y,1] ... [blockDim .x,blockDim.y,1] - (...) - [blockDim.x,blockDim.y,blockDim.z]

解决方案>

是的,这是正确的排序;线程被排序,其中x维首先变化,然后是y,然后是z(等于列主顺序)在块内。计算可以表示为

  int threadID = threadIdx.x + 
blockDim.x * threadIdx.y +
(blockDim.x * blockDim.y)* threadIdx.z;

int warpID = threadID / warpSize;
int laneID = threadID%warpsize;

这里 threadID 块, warpID 是块内的warp, laneID 是warp内的线程号。



请注意,线程不一定按照与块中的排序相关的任何可预测的顺序执行。执行模型保证在相同warp 中的线程被执行锁步,但是你不能从一个块内的线程编号中推断出更多。


As title, I would like to know the right execution order in case we have a 3d block

I think to remember that I read already something regarding it, but it was some time ago, I dont remember where but it was coming by someone who didnt look so reliable..

Anyway I would like to have some confirmations about it.

Is it as the following (divided in warps)?

[0, 0, 0]...[blockDim.x, 0, 0] - [0, 1, 0]...[blockDim.x, 1, 0] - (...) - [0, blockDim.y, 0]...[blockDim.x, blockDim.y, 0] - [0, 0, 1]...[blockDim.x, 0, 1] - (...) - [0, blockDim.y, 1]...[blockDim.x, blockDim.y, 1] - (...) - [blockDim.x, blockDim.y, blockDim.z]

解决方案

Yes, that is the correct ordering; threads are ordered with the x dimension varying first, then y, then z (equivalent to column-major order) within a block. The calculation can be expressed as

int threadID = threadIdx.x + 
               blockDim.x * threadIdx.y + 
               (blockDim.x * blockDim.y) * threadIdx.z;

int warpID = threadID / warpSize;
int laneID = threadID % warpsize;

Here threadID is the thread number within the block, warpID is the warp within the block and laneID is the thread number within the warp.

Note that threads are not necessarily executed in any sort of predicable order related to this ordering within a block. The execution model guarantees that threads in the same warp are executed "lock-step", but you can't infer any more than that from the thread numbering within a block.

这篇关于Cuda,执行线程在3d块中的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 07:44