问题描述
我的内核在 CC 3.0 (Kepler) 上的性能比在 CC 2.0 (Fermi) 上的性能差.在 Nsight 分析器中,Warp Issue Efficiency
图表显示 60% 的时间没有符合条件的 warp,Issue Stall Reasons
图表显示 60%这些是由于其他"造成的.
I have a kernel that is performing poorly on CC 3.0 (Kepler) as opposed to CC 2.0 (Fermi). In the Nsight profiler, the Warp Issue Efficiency
chart is showing that 60% of the time, there were no eligible warps and the Issue Stall Reasons
chart is showing that 60% of these are due to "Other".
我想知道其他问题停滞的原因是什么以及我可以做些什么来减少它们.
I'm wondering what the Other issue stall reasons are and what I might do to reduce them.
CUDA 5.0./Nsight 3.0.RC/CC 3.0.
CUDA 5.0. / Nsight 3.0. RC / CC 3.0.
推荐答案
在 Nsight Visual Studio 3.0 版 CUDA Profiler 中,问题效率显示了扭曲停止原因的饼图.停顿的原因是指令获取、执行依赖、数据请求、纹理、同步等.
In Nsight Visual Studio Edition 3.0 CUDA Profiler the Issue Efficiency displays a pie chart of the warp stall reasons. The stall reasons are Instruction Fetch, Execution Dependency, Data Requests, Texture, Synchronization, and Other.
对于 Compute Capability 3.* 设备,Other 类别是由于以下原因导致活动扭曲停止的时间百分比:
For Compute Capability 3.* devices the Other category is the percentage of time that active warps are stalled due to the following reasons:
- 执行单元正忙(减少使用低吞吐量整数运算)
- 注册库冲突(编译器问题有时会因大量使用矢量数据类型而变得更糟)
- 每个调度程序的扭曲太少
对于 Compute Capability 5.* 和 6.* 设备,Other 类别是由于以下原因导致活动扭曲停止的时间百分比:
For Compute Capability 5.* and 6.* devices the Other category is the percentage of time that active warps are stalled due to the following reasons:
- 注册库冲突(编译器问题有时会因大量使用矢量数据类型而变得更糟)
- warp 等待解析分支
- 优先级较低且当前未考虑调度的warp
对于 5.* 和 6.*,尤其是 gp100,如果内核达到每个 warp 调度程序 32 个 warp,最后一个原因可能非常高 (~75%).
For 5.* and 6.*, especially gp100, the last reason can be very high (~75%) if the kernel reaches 32 warps per warp scheduler.
这些停顿原因归入另一类,因为很难确定开发人员可以采取哪些措施来解决这些问题.
These stalls reasons are grouped into the other category as it is hard to identify actions that a developer can taken to resolve these issues.
这篇关于什么是“其他"?Nsight 分析器显示的问题停顿原因?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!