问题描述
我想知道启动内核时是否有一种方法来限制每个线程使用的寄存器数量。
我在每个线程上执行大量计算,因此使用的寄存器数量过多,因此占用率较低。我想尝试减少使用的寄存器数量,以尝试改善并行线程的执行,也许是以更多的内存访问为代价的。
as the title says I would like to know if there is a way to limit the number of registers used by each thread when I launch a kernel.I'm performing a lot of computation on each thread and so the number of registers used is too high and then the occupancy is low. I would like to try to reduce the number of registers used in order to try to improve parallel thread execution, maybe at the cost of more memory accesses.
我搜索了回答,但我没有找到解决方案。我认为可以设置CUDA工具链中线程使用的最大寄存器数量,但是使用Numba时是否也可以?
I searched for the answer but I didn't find a solution. I think that is possible to set a maximum number of registers used by thread with the CUDA toolchain, but is it also possible when using Numba?
编辑:也许还强制了为了强制编译器减少已用寄存器的数量,在多处理器中执行的最小块数。
Maybe also forcing a minimum numbers of blocks to be executed in a multi processor in order to force the compiler to reduce the number of used registers.
推荐答案
据我所知,numba提供的 cuda.jit
工具不允许将参数传递给CUDA汇编器,这将允许控制寄存器分配,这可能与
To the best of my knowledge, the cuda.jit
facility offered by numba does not allow passing of arguments to the CUDA assembler which would allow control of register allocation, as is possible with the native CUDA toolchain.
因此,我认为没有一种方法可以解决您所要求的问题。
So I don't think there is a way to do what you have asked about.
这篇关于如何限制Numba(CUDA)中每个线程使用的寄存器数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!