问题描述
我有一个由3D块组成的3 D网格。我希望在每次调用内核时计算每个坐标的各个线程索引。我有这些参数:
I have a 3 D grid consisting of 3D blocks. I wish to calculate the individual thread indexes of each coordinates every time the kernel is being called. I have these parameters:
dim3 blocks_query(32,32,32);
dim3 threads_query(32,32,32);
kernel<<< blocks_query,threads_query >>>();
在内核中,我想计算x,y和z坐标的各个值, x = 0,y = 0,z = 0,x = 0,y = 0,z = 1,x = 0,y = 0,z = 2, >
Inside the kernel, I wish to calculate the individual values of x,y and z coordinates for instance, x=0,y=0,z=0, x=0,y=0,z=1, x=0,y=0,z=2,....thanks in advance....
推荐答案
各个线程索引(x,y,z坐标)可以在内核中计算如下:
Individual thread indices (x, y, z coordinates) can be calculated inside the kernel as follows:
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int z = blockIdx.z * blockDim.z + threadIdx.z;
请记住,每个块的线程数受GPU限制。因此您创建的块大小无效。
Keep in mind that the number of threads per block is limited by the GPU. So the block size you have created is invalid.
dim3 threads_query(32,32,32)
它等于每个块32768个线程,不受任何当前CUDA设备支持。目前,对于计算能力2.0及以上的GPU,每个块最多支持1024个线程,而对于较旧的GPU最多支持512个线程。你应该减少块大小,否则内核不会启动。另一件需要注意的事情是,您正在创建仅在Compute 2.0及更高版本的CUDA GPU上支持的3D网格。
It equals to 32768 threads per block which is not supported by any of the current CUDA devices. Currently, maximum 1024 threads per block is supported for GPUs of Compute capability 2.0 and above while maximum 512 threads for older GPUs. You should reduce the block size otherwise the kernel would not launch.Another thing to be noted is that you are creating 3D grid which is supported only on CUDA GPUs of Compute 2.0 and above.
UPDATE
UPDATE
假设您的3D数据的尺寸为 xDim
, yDim
和 zDim
,则可以如下形成线程块的通用网格:
Suppose the dimensions of your 3D data are xDim
, yDim
and zDim
, then a generic grid of thread blocks can be formed as follows:
dim3 threads_query(8,8,8);
dim3 blocks_query;
blocks_query.x = (xDim + threads_query.x - 1)/threads_query.x;
blocks_query.y = (yDim + threads_query.y - 1)/threads_query.y;
blocks_query.z = (zDim + threads_query.z - 1)/threads_query.z;
上述方法将创建总数等于或大于总数据大小的线程。额外的线程可能导致无效的内存访问。所以在内核中执行绑定检查。你可以通过传递 xDim
, yDim
和 zDim
作为内核参数并在内核中添加以下行:
The above approach will create total number of threads equal to or greater than the total data size. The extra threads may cause invalid memory access. So perform bound checks inside the kernel. You can do this by passing xDim
, yDim
and zDim
as kernel arguments and adding the following line inside the kernel:
if(x>=xDim || y>=yDim || z>=zDim) return;
这篇关于如何计算3 D网格中的单个螺纹坐标指数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!