问题描述
在运行的GCC编译器的优化-O2一些测试,我发现下面的指令在拆卸code的功能:
data32 data32 data32 data32 nopw%CS:为0x0(RAX%,%RAX,1)
这是什么指令呢?
要更详细的,我想了解编译器如何优化无用递归像下面O2优化:
INT富(无效)
{
返回美孚();
}
INT主要(无效)
{
返回美孚();
}
在没有优化的编译以上code使栈溢出,但适用于O2优化code。
我想与O2它彻底删除的推动作用foo的堆栈,但为什么是 data32 data32 data32 data32 nopw%CS:为0x0(%RAX,%RAX,1)
需要的?
0000000000400480<富计算值:
富():
400480:EB FE JMP 400480<富>
400482:66 66 66 66 66 2E 0F data32 data32 data32 data32 nopw%CS:为0x0(RAX%,%RAX,1)
400489:1F 84 00 00 00 00 000000000000400490<主计算值:
主要():
400490:EB FE JMP 400490<主>
您看到一个。这意味着,它有它的流水线,以及的consecuting指令执行死刑的不同阶段发生的并行。例如,如果有一个
MOV EAX,EBX;(#1)
MOV ECX,EDX;(#2)
然后加载和放大器;当执行#1#指令2解码已经可以发生。
该流水线具有重大的问题,在分支机构的情况下解决的,即使他们是无条件的。
例如,而 JMP
正在解码,下一条指令已经是$ pfetched到管道P $。但 JMP
改变下一条指令的位置。在这种情况下,管道需要通过清空和重填,以及大量有价值的CPU周期将会丢失。
看起来这个空循环运行,如果管道充满在这种情况下,无操作更快,尽管它不会被永远执行。它实际上是x86管线的一些不常见的功能的优化。
此前十二月阿尔法甚至可以从这样的事情段错误,空循环不得不有很多他们没有的OPS。 86是仅慢。这是因为他们必须与英特尔8086兼容。
你可以阅读很多从管道分支指令的处理。
While running some tests for the -O2 optimization of the gcc compilers, I observed the following instruction in the disassembled code for a function:
data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
What does this instruction do?
To be more detailed I was trying to understand how does the compiler optimize useless recursions like the below with O2 optimization:
int foo(void)
{
return foo();
}
int main (void)
{
return foo();
}
The above code causes stack overflow when compiled without optimization, but works for O2 optimized code.
I think with O2 it completely removed the pushing the stack of the function foo, but why is the data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
needed?
0000000000400480 <foo>:
foo():
400480: eb fe jmp 400480 <foo>
400482: 66 66 66 66 66 2e 0f data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
400489: 1f 84 00 00 00 00 00
0000000000400490 <main>:
main():
400490: eb fe jmp 400490 <main>
You see an operand forwarding optimization of the cpu pipeline.
Although it is an empty loop, gcc tries to optimize this as well :-).
The cpu you are running has a superscalar architecture. It means, that it has a pipeline in it, and different phases of the executions of the consecuting instructions happen parallel. For example, if there is a
mov eax, ebx ;(#1)
mov ecx, edx ;(#2)
then the loading & decoding of instruction #2 can happen already while #1 is executed.
The pipelining has major problems to solve in the case of the branches, even if they are unconditional.
For example, while the jmp
is decoding, the next instruction is already prefetched into the pipeline. But the jmp
changes the location of the next instruction. In such cases, the pipeline needs to by emptied and refilled, and a lot of worthy cpu cycles will be lost.
Looks this empty loop will run faster if the pipeline is filled with a no-op in this case, despite that it won't be ever executed. It is actually an optimization of some uncommon feature of the x86 pipeline.
Earlier dec alphas could even segfault from such things, and empty loops had to have a lot of no-ops in them. x86 would be only slower. This is because they must be compatible with the intel 8086.
Here you can read a lot from the handling of branching instructions in pipelines.
这篇关于什么是data32的含义data32 nopw%CS:为0x0(RAX%,%RAX,1)在GCC内联汇编指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!