本文介绍了在CUDA中重播指令的其他原因的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是我从nvprof(CUDA 5.5)获得的输出:
This is the output I get from nvprof (CUDA 5.5):
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla K40c (0)"
Kernel: MyKernel(double const *, double const *, double*, int, int, int)
60 inst_replay_overhead Instruction Replay Overhead 0.736643 0.925197 0.817188
60 shared_replay_overhead Shared Memory Replay Overhead 0.000000 0.000000 0.000000
60 global_replay_overhead Global Memory Replay Overhead 0.108972 0.108972 0.108972
60 global_cache_replay_overhead Global Memory Cache Replay Ove 0.000000 0.000000 0.000000
60 local_replay_overhead Local Memory Cache Replay Over 0.000000 0.000000 0.000000
60 gld_transactions Global Load Transactions 25000 25000 25000
60 gst_transactions Global Store Transactions 75000 75000 75000
60 warp_nonpred_execution_efficie Warp Non-Predicated Execution 99.63% 99.63% 99.63%
60 cf_issued Issued Control-Flow Instructio 44911 45265 45101
60 cf_executed Executed Control-Flow Instruct 39533 39533 39533
60 ldst_issued Issued Load/Store Instructions 273117 353930 313341
60 ldst_executed Executed Load/Store Instructio 50016 50016 50016
60 stall_data_request Issue Stall Reasons (Data Requ 65.21% 68.93% 67.86%
60 inst_executed Instructions Executed 458686 458686 458686
60 inst_issued Instructions Issued 789220 879145 837129
60 issue_slots Issue Slots 716816 803393 759614
内核使用356字节cmem [0],并且没有共享内存。而且,没有寄存器溢出。
我的问题是,在这种情况下重播指令的原因是什么?我们看到的开销为81%,但数字却没有相加。
The kernel uses 356 bytes cmem[0] and no shared memory. Also, no register spills.My question is, what is the reason for instruction replays in this case? We see an overhead of 81% but the numbers do not add up.
谢谢!
推荐答案
一些可能的原因:
- 共享存储库冲突(您没有)
- 常量内存冲突(即,warp中的不同线程从同一指令中请求常量内存中的不同位置)
- warp-divergent代码(如果..then..else在弯道中为不同的线程采用不同的路径)
此可能很有趣,尤其是幻灯片8-11。
This presentation may be of interest, especially slides 8-11.
这篇关于在CUDA中重播指令的其他原因的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!