问题描述
默认情况下,内核将使用设备的所有可用SM(如果有足够的块).但是,现在我有2个具有1个计算强度和1个内存强度的流,并且我想分别限制2个流使用的最大SM(设置最大SM后,一个流中的内核将使用最大SM,例如20SM用于计算密集型,而4SM用于存储密集型),是否可以这样做?(如果可能,我应该使用哪个API)
By default, the kernel will use all available SMs of the device (if enough blocks). However, now I have 2 stream with one computational-intense and one memory-intense, and I want to limit the maximal SMs used for 2 stream respectively (after setting the maximal SMs, the kernel in one stream will use up to maximal SMs, like 20SMs for computational-intense and 4SMs for memory-intense), is it possible to do so? (if possible, which API should I use)
推荐答案
简而言之,没有办法做您想像的事情.
In short, no there is no way to do what you envisage.
CUDA执行模型没有提供这种粒度,这不是偶然的.通过将调度和工作分配的级别抽象化,这意味着(在一定程度上)您可以在给定体系结构的最小GPU上运行的任何代码也可以在最大的GPU上运行而无需进行任何修改.从可移植性和互操作性的角度来看,这一点很重要.
The CUDA execution model doesn't provide that sort of granularity, and that isn't an accident. By abstracting that level of scheduling and work distribution away, it means (within reason) any code you can run on the smallest GPU of a given architecture can also run on the largest without any modification. That is important from a portability and interoperability point of view.
这篇关于是否可以手动设置用于一个CUDA流的SM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!