问题描述
我在编译我的.cu代码时使用了--ptax-options = -v,它提供了以下内容:
i used --ptax-options=-v while compiling my .cu code, it gave the following:
ptxas info: Used 74 registers, 124 bytes smem, 16 bytes cmem[1]
devQuery for my card returns the following:
rev: 2.0
name: tesla c2050
total shared memory per block: 49152
total reg. per block: 32768
现在,我将这些数据输入到cuda占用计算器,如下:
now, i input these data into cuda occupancy calculator as follows:
1.) 2.0
1.b) 49152
2.) threads per block: x
registers per thread: 74
shared memory per block (bytes): 124
x(每块线程),使得x * 74
i was varying the x (threads per block) so that x*74<=32768. for example, i enter 128 (or 256) in place of x. Am I entering all the required values by occupancy calculator correctly? thanks.
推荐答案
ptxas-options = - verbose
v)生成格式
ptxas : info : Compiling entry function '_Z13matrixMulCUDAILi16EEvPfS0_S0_ii' for 'sm_10'
ptxas : info : Used 15 registers, 2084 bytes smem, 12 bytes cmem[1]
关键信息
- 第一行具有目标架构
- 第二行有
每个线程的寄存器>,<每块的静态共享存储器>,<每个内核的常量内存>
- 1st line has the target architecture
- 2nd line has
<Registers Per Thread>, <Static Shared Memory Per Block>, <Constant Memory Per Kernel>
在您填写占用计算器
- 设置字段1.)选择计算能力至上述示例中的sm_10
- 设置字段2.)每个线程注册
- 设置字段2.)将每块内存共享为+ DynamicSharedMemoryPerBlock作为第三个参数传递给
<<<< GridDim,BlockDim,DynamicSharedMemoryPerBlock,Stream>>< / code>
计算器帮助选项卡包含其他信息。
The Occupancy Calculator Help tab contains additional information.
在您的示例中,我相信您没有正确设置字段1,因为Fermi架构限制为每个线程63个寄存器。 sm_1 *支持每个主题124个注册限制。
In your example I believe you are not correctly setting field 1 as Fermi architecture is limited to 63 Registers Per Thread. sm_1* supports a limit of 124 Registers Per Thread.
这篇关于cuda占用计算器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!