问题描述
,对我来说是不明确的:
There is a line in CUDA Compiler Driver NVCC - Options for steering GPU code generation which is ambiguous to me:
ABI是否具有任何标准或限制对于 __ global __
和 __ device __
函数使用的寄存器数量?
Does the ABI have any standard or limitations for the number of registers that __global__
and __device__
functions use?
推荐答案
我想(现在找不到引用)CUDA ABI至少需要16个寄存器。因此,如果您指定较低的寄存器计数(例如使用-maxrregcount),编译器会将指定的限制提升到ABI所需的最小值,并打印一条建议消息,说明它已这样做。对于每个线程可用的32位寄存器的最大数量,它取决于GPU体系结构:124个寄存器用于sm_1x,63个寄存器用于sm_2x,254个寄存器用于sm_3x。
I think (can't find a reference right now) the CUDA ABI requires at least 16 registers. So if you specificy a lower register count (e.g. with -maxrregcount) the compiler will bump the specified limit up to the minimum required by the ABI, and will print an advisory message stating that it did so. As for the maximum number of 32-bit registers available per thread, it is GPU architecture dependent: 124 registers for sm_1x, 63 registers for sm_2x, and 254 registers for sm_3x.
一般来说,ABI(应用程序二进制接口)是用于存储布局,将参数传递给函数,将函数结果传递回调用程序等的体系结构特定约定.ABI(包括x86_64,ARM)通常指定特定的寄存器诸如堆栈指针,函数返回值,函数参数等任务。由于GPU架构允许每个线程具有可变数量的寄存器,所以使用ABI需要存在最少数量的寄存器来填充这些定义的角色。如果我记得正确,CUDA推出了3.0版本的ABI,这是支持Fermi级GPU的第一个版本。
Generally speaking, an ABI (application binary interface) is an architecture specific convention for storage layout, passing of arguments to functions, passing of function results back to the caller etc.. ABIs (including x86_64, ARM) often designate specific registers for specific tasks such as stack pointer, function return value, function arguments etc. Since the GPU architecture allows a variable number of registers per thread, use of the ABI requires a minimal number of registers to be present to fill these defined roles. If I recall correctly, CUDA introduced an ABI with version 3.0, which was the first version to support Fermi-class GPUs.
ABI需要2.0或更高的计算能力。旧的GPU架构缺乏ABI所需的硬件特性。大多数较新的CUDA特性,例如设备端printf()和malloc(),被称为函数,单独编译等依赖并需要使用ABI,并且默认情况下在编译器生成的代码中使用sm_20和以上。您可以通过-Xptxas -abi = no禁止使用ABI。我强烈建议不要这样做。
The ABI requires compute capability 2.0 or higher. Older GPU architecture lacked hardware features required for the ABI. Most of the newer CUDA features, such as device-side printf() and malloc(), called functions, separate compilation, etc rely on and require the use of the ABI, and it is used by default in compiler generated code for sm_20 and above. You can disable the use of the ABI with -Xptxas -abi=no. I would strongly advise against doing that.
这篇关于ABI如何定义GPU中的寄存器数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!