问题描述
依赖于,
- 每个多处理器的最大驻留块数8
- 每个多处理器最大共享内存量16 KB
- Maximum number of resident blocks per multiprocessor 8
- Maximum amount of shared memory per multiprocessor 16 KB
这是否意味着,如果我有很多运行的块,每个人只能有2 KB的共享内存?如果不是这样,并且每个块仍然具有16KB共享内存,那么当在signle MP上执行2个具有16KB存储器的块时,它被存储。
在多处理器上运行的所有块必须共享所有资源(寄存器,共享内存等)。
如果你的线程块使用共享内存,它必须满足的第一个规则是它不能使用多于SM中可用的(即在这种情况下为16KB)。
如果threadblock需要小于16KB,则可以在SM上执行多个线程块。例如,如果每个线程块仅使用大约8KB,则可以执行两个线程块。如果每个线程块仅使用最多(略小于)4KB(通常有一些开销),则可以执行四个线程块。
如果您希望最多8个线程块能够在给定的SM(多处理器)上同时执行,那么您必须在代码中确保线程块使用不超过2KB的共享内存(可能小于2KB)。
如果每个线程块使用16KB共享内存,则意味着额外的线程块将等待
如果一个线程块试图使用超过16KB(在这种情况下),你会得到一个内核启动错误。
Refered to wiki/CUDA,
Does it mean, if I have a lot of running blocks, every of them can have only 2 KB of shared memory? If it isn't so and every block still have 16KB shared memory, there is it stored, when 2 blocks with 16KB memory are executing on signle MP?
All of the blocks running on a multiprocessor must share all resources (registers, shared memory, etc.)
If your threadblock uses shared memory, the first rule it must satisfy is that it cannot use more than what is available in the SM (i.e. 16KB in this case).
If the threadblock requires less than 16KB, then it may be possible to have multiple threadblocks executing on the SM. For example, two threadblocks could be executing if each only uses approximately 8KB. Four threadblocks could be executing if each only used at most (slightly less than) 4KB (there is some overhead, usually).
If you wanted the maximum of 8 threadblocks to be able to execute at once on a given SM (multiprocessor), then you would have to ensure in your code that the threadblock uses no more than 2KB of shared memory (probably a little less than 2KB).
If each threadblock used 16KB of shared memory, it simply means that additional threadblocks will wait in a queue until that threadblock is finished on that SM, before they begin to execute.
If a threadblock attempted to use more than 16KB (in this case) you would get a kernel launch error.
这篇关于sm13上块的共享内存的实际数量是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!