问题描述
似乎预取使用的一般逻辑是可以添加预取,前提是代码忙于处理,直到预取指令完成其操作.但是,似乎如果使用过多的预取指令,那么它会影响系统的性能.我发现我们首先需要没有预取指令的工作代码.稍后我们需要在代码的各个位置进行预取指令的各种组合,并进行分析以确定由于预取而实际上可以改进的代码位置.有没有更好的方法来确定应该使用预取指令的确切位置?
It appears the general logic for prefetch usage is that prefetch can be added, provided the code is busy in processing until the prefetch instruction completes its operation. But, it seems that if too much of prefetch instructions are used, then it would impact the performance of the system. I find that we need to first have the working code without prefetch instruction. Later we need to various combination of prefetch instruction in various locations of code and do analysis to determine the code locations that could actually improve because of prefetch. Is there any better way to determine the exact locations in which the prefetch instruction should be used ?
推荐答案
在大多数情况下,预取指令几乎没有好处,甚至在某些情况下会适得其反.大多数现代 CPU 都具有自动预取机制,该机制运行良好,添加软件预取提示几乎没有效果,甚至会干扰自动预取,并且实际上会降低性能.
In the majority of cases prefetch instructions are of little or no benefit, and can even be counter-productive in some cases. Most modern CPUs have an automatic prefetch mechanism which works well enough that adding software prefetch hints achieves little, or even interferes with automatic prefetch, and can actually reduce performance.
在一些极少数情况下,例如当您正在流式传输大块数据而您几乎没有进行实际处理时,您可能会设法通过软件启动的预取来隐藏一些延迟,但很难做到正确 -您需要在使用数据之前启动数百个周期的预取 - 做得太晚了,您仍然会遇到缓存未命中,做得太早,您的数据可能会在您准备使用它之前从缓存中逐出.通常这会将预取放在代码的一些不相关的部分,这不利于模块化和软件维护.更糟糕的是,如果您的架构发生变化(新 CPU、不同的时钟速度等),导致 DRAM 访问延迟增加或减少,您可能需要将预取指令移动到代码的另一部分以保持它们的有效性.
In some rare cases, such as when you are streaming large blocks of data on which you are doing very little actual processing, you may manage to hide some latency with software-initiated prefetching, but it's very hard to get it right - you need to start the prefetch several hundred cycles before you are going to be using the data - do it too late and you still get a cache miss, do it too early and your data may get evicted from cache before you are ready to use it. Often this will put the prefetch in some unrelated part of the code, which is bad for modularity and software maintenance. Worse still, if your architecture changes (new CPU, different clock speed, etc), such that DRAM access latency increases or decreases, you may need to move your prefetch instructions to another part of the code to keep them effective.
无论如何,如果你觉得你真的必须使用预取,我建议在任何预取指令周围使用#ifdefs,这样你就可以在有和没有预取的情况下编译代码,看看它是否真的有助于(或阻碍)性能,例如
Anyway, if you feel you really must use prefetch, I recommend #ifdefs around any prefetch instructions so that you can compile your code with and without prefetch and see if it is actually helping (or hindering) performance, e.g.
#ifdef USE_PREFETCH
// prefetch instruction(s)
#endif
不过,总的来说,我建议您在完成所有更高效、更明显的事情后,将软件预取放在次要位置,作为最后的微优化.
In general though, I would recommend leaving software prefetch on the back burner as a last resort micro-optimisation after you've done all the more productive and obvious stuff.
这篇关于预取指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!