问题描述
当我们通过使用xptxas检查寄存器使用情况时,我们看到如下:
When we check the register usage by using xptxas we see something like this:
ptxas info : Used 63 registers, 244 bytes cmem[0], 51220 bytes cmem[2], 24 bytes cmem[14], 20 bytes cmem[16]
我想知道目前是否有任何文档清楚地解释了cmem [x]。将常数内存分成多个存储区,多少个存储区总共有多少个存储区,以及除0,2,14,16以外的其他存储区是什么?
I wonder if currently there is any documentation that clearly explains cmem[x]. What is the point of separating constant memory into multiple banks, how many banks are there in total, and what are other banks other than 0, 2, 14, 16 used for?
作为旁注,@njuffa(特别感谢你)之前在nvidia的论坛解释什么是银行0,2,14,16:
as a side note, @njuffa (special thanks to you) previously explained on nvidia's forum what is bank 0,2,14,16:
使用常数内存分区在常量程序'变量'(bank 1),加上编译器生成的常量(bank 14)。
Used constant memory is partitioned in constant program ‘variables’ (bank 1), plus compiler generated constants (bank 14).
:内核参数
cmem[0]:kernel arguments
cmem [2]:用户定义的常量对象
cmem[2]:user defined constant objects
cmem [16]:编译器生成的常量(其中一些可能对应于源代码中的文字常量)
cmem[16]:compiler generated constants (some of which may correspond to literal constants in the source code)
推荐答案
CUDA对GPU常量库的使用没有正式记录在我的知识中。不同GPU的数量和用法不同。这些是程序员不必担心的低级实现细节。
The usage of GPU constant banks by CUDA is not officially documented to my knowledge. The number and usage of constant banks does differ between GPU generations. These are low-level implementation details that programmers do not have to worry about.
如果需要,可以通过查看为给定平台生成的机器代码(SASS),对常量库的使用进行逆向工程。事实上,这是我想出的原始问题中引用的信息(这些信息来自我的NVIDIA开发人员论坛帖子)。我记得,我在那里提供的信息是基于adhoc逆向工程专门应用于费米级设备,但我无法验证这一点,因为目前无法访问论坛。
The usage of constants banks can be reversed engineered, if so desired, by looking at the machine code (SASS) generated for a given platform. In fact, this is how I came up with the information cited in the original question (this information came from an NVIDIA developer forum post of mine). As I recall, the information I gave there was based on adhoc reverse engineering specifically applied to Fermi-class devices, but I am unable to verify this at this time as the forums are inaccessible at the moment.
有多个常量库的一个原因是保留用户可见的常量存储器供CUDA程序员使用,同时存储由硬件或工具提供的额外的只读信息在其他常量库中。
One reason for having multiple constant banks is to reserve the user visible constant memory for the use of CUDA programmers, while storing additional read-only information provided by hardware or tools in additional constant banks.
请注意,CUDA数学库作为源文件提供,并且函数内联到用户代码中,因此CUDA数学库函数的常量内存使用包括在用户可见的统计中常数记忆。
Note that the CUDA math library is provided as source files and the functions get inlined into user code, therefore constant memory usage of CUDA math library functions is included in the statistics for the user-visible constant memory.
这篇关于CUDA常量存储体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!