本文介绍了如何填充二维线程块以进行 warp 调度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我了解对于具有 31 个线程的 1D 线程块,它将被填充到 32 个线程以执行 warp.具有 31*31 线程的 2D 块呢?warp scheduler 会为每个维度额外填充 1 个线程(即总共会填充 31 个),或者这个 2D 块线程将被连接起来,只填充最后一个线程(31*31=961; 961%32=1)?

I understand that for a 1D thread block with 31 threads, it will be padded to 32 threads for warp execution.What about a 2D block with 31*31 threads? Will warp scheduler pad 1 additional thread for each dimension (ie a total of 31 will be padded), or this 2D block threads will be concatenated and only the last thread will be padded (31*31=961; 961%32=1)?

推荐答案

只有一个warp(最后一个)被填充.线程按 x、y、z 的顺序分组到 warp 中.这样,如果您有一个奇数的 2D 数组大小,例如 17x17,它连续存储在内存中,您仍然可以从 17x17 线程块中创建 32 线程扭曲,从而生成合并访问.通过这种方式,除了最后一个,所有的 warp 都将生成完全合并的访问.如果在此过程中使用死线程填充单个 warp,则在此示例中的内存访问方面会更加浪费.

Only one warp (the last one) gets padded. Threads are grouped into warps in the order x, y, z. In this way, if you have an odd 2D array size, like 17x17, that is stored contiguously in memory, you can still create 32-thread warps out of a 17x17 thread block that will generate coalesced accesses. In this way, all of the warps will generate fully coalesced accesses except the last one. If individual warps were padded with dead threads along the way, it would be more wasteful in terms of memory accesses in this example.

对于这个示例,至少从机器利用率的角度来看,它的效果更好.

For this example, at least, it works better from a machine utilization standpoint.

对此的文档支持取决于对线程 ID 和线程 index 不同的理解.

The documentational support for this rests on understanding that thread ID and thread index are not the same.

给定线程的线程索引由内置变量 threadIdx.xthreadIdx.ythreadIdx.z 标识.线程 ID 是分配给每个线程的唯一(在线程块内)标量编号.

Thread index for a given thread is identified by the built-in variables threadIdx.x, threadIdx.y, and threadIdx.z. Thread ID is a unique (within the threadblock), scalar number assigned to each thread.

线程ID和线程索引的关系由这个语句给出:

The relationship between thread ID and thread index is given by this statement:

"线程的索引和它的线程ID直接相关:对于一维块,它们是相同的;对于大小为(Dx,Dy)的二维块,线程索引为(x,y)的线程ID为(x + y Dx);对于大小为(Dx,Dy,Dz)的三维块,索引为(x,y,z)的线程的线程ID是 (x + y Dx + z Dx Dy)."

但是将线程分组为warp是明确地通过线程ID完成的:

But the grouping of threads into warps is done explicitly by thread ID:

一个块被划分为warp的方式总是相同的;每个warp包含连续的线程,增加线程ID,第一个warp包含线程0."

因此,根据第一个语句,我们看到即使对于像 17x17 这样的奇数块形状,除了线程块维度内的线程之外,没有定义线程.然后根据第二条语句,按线程 ID 连续组装 warp 会创建所有已在其中定义线程的 warp(也许最后一个除外.)

So based on the first statement, we see that even for an odd block shape like 17x17, there are no threads defined other than those which are within the dimensionality of the threadblock. Then based on the second statement, the consecutive assembly of warps by thread ID creates warps all of which have defined threads in them (except perhaps the last one.)

这篇关于如何填充二维线程块以进行 warp 调度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 22:07