问题描述
我有一个测试问题.
哪些指令可能会减慢处理器的工作,然后管道不预测(分支预测)进一步的执行方式?
可能的答案:JGE |添加 |订阅 |推 |JMP |JNZ |多|JG |打电话
如果我们谈论分支预测,是 JGE、JMP、JNZ &JG要走的路?
推荐答案
mul
之类的指令 不要对 EIP 做任何特别的事情 当然不能误报,但是每一种 跳转/调用/分支都可能对某些流水线设计的学位,即使是一个简单的调用rel32
.在像现代 x86 CPU 这样的大量流水线乱序执行设计中,这种影响可能会很严重.
The instructions like mul
that don't do anything special to EIP of course can't mispredict, but every kind of jump / call / branch can mispredict to some degree in a pipelined design, even a simple call rel32
. The effects can be serious in a heavily pipelined out-of-order execution design like modern x86 CPUs.
是的,jcc
条件分支总是需要预测;FLAGS 的值在解码时不可用,只有在稍后执行时才可用.
Yes, jcc
conditional branches always need prediction; the value of FLAGS isn't available when decoding, only later when executing.
即使是直接的jmp rel8
/jmp rel32
(和call rel32
)也需要在前面早期预测-结束,甚至在它们被解码之前,因此获取阶段知道在获取可能包含或不包含跳转的块(无条件或预测采取的条件;它不需要知道,只是是否继续沿直线获取).请参阅慢速 jmp-instruction 以了解更多关于简单的无条件直接分支在 BTB 数量过多时运行速度较慢的信息.
Even direct jmp rel8
/ jmp rel32
(and call rel32
) need prediction early in the front-end, before they're even decoded, so the fetch stage knows which block to fetch next after fetching a block that might or might not include a jump (unconditional or predicted-taken conditional; it doesn't need to know, just whether to keep fetching in a straight line or not). See Slow jmp-instruction for more about simple unconditional direct branches running slower if you have too many for the BTB.
如果您考虑一个简单的有序管道,例如 经典的 5 阶段 RISC,阶段之间没有缓冲区,所有分支基本上是等价的:获取阶段每个时钟需要获取 1 条指令以避免气泡.它需要在前一条指令仍在解码时知道下一个取指地址.更长的管道使这个问题更加严重.
If you consider a simple in-order pipeline like a classic 5-stage RISC, with no buffers between stages, all branches are basically equivalent: the fetch stage needs to fetch 1 instruction per clock to avoid bubbles. It needs to know the next fetch address while the previous instruction is still decoding. Longer pipelines make this problem even worse.
但更简单的是,有 jmp
和 call
的间接形式,例如 jmp eax
或 jmp [edi]
从寄存器或内存加载新的 EIP.那些显然需要预测;你有无限的可能性来决定它的去向,而不仅仅是被接受或不被接受.
But more simply, there are indirect forms of jmp
and call
like jmp eax
or jmp [edi]
that load a new EIP from a register or memory. Those obviously need prediction; you have unlimited possibilities for where it will go, not just taken or not-taken.
依赖于数据的分支(以 FLAGS 为条件,或间接依赖于寄存器或内存)可以在发现错误预测之前一直进入后端(并乱序执行).恢复可能需要丢弃从错误路径执行后面的指令的结果,以及获取/解码正确路径.当 Skylake CPU 错误预测分支时究竟会发生什么?
Branches that depend on data (conditional on FLAGS, or indirect on register or memory) can get all the way into the back-end (and execute out-of-order) before a mispredict is discovered. Recovering may require discarding results of executing later instructions from the wrong path, as well as fetching/decoding the correct path. What exactly happens when a skylake CPU mispredicts a branch?
但是处理直接 jmp/call 的错误预测更简单:只需重新引导提取/解码阶段,因为在解码指令后目标地址是已知的,而不必执行它.错误预测并没有进入后端,所以它只是"前端的一个泡沫.
But handling mispredicts of direct jmp/call is simpler: just re-steer the fetch/decode stages because the target address is known after decoding the instruction, without having to execute it. The misprediction doesn't make it into the back-end so it's "just" a bubble in the front-end.
有趣的事实:ret
也可以错误预测;它基本上是一个间接分支(pop eip
).但是有一些特殊的预测器利用 call 和 ret 指令之间通常的配对,保留最近调用的内部堆栈,反映内存中调用堆栈的可能使用方式.http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/一个>
Fun fact: ret
can also mispredict; it's basically an indirect branch (pop eip
). But there are special predictors that take advantage of the usual pairing between call and ret instructions, keeping an internal stack of recent calls that mirrors how the callstack in memory will probably be used. http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/
这篇关于哪些指令会在 x86 CPU 上产生分支预测错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!