问题描述
我目前正在学习汇编的基础知识,并且在查看gcc(6.1.1)生成的指令时遇到了一些奇怪的事情.
I am currently learning the basics of assembly and came across something odd when looking at the instructions generated by gcc (6.1.1).
以下是来源:
#include <stdio.h>
int foo(int x, int y){
return x*y;
}
int main(){
int a = 5;
int b = foo(a, 0xF00D);
printf("0x%X\n", b);
return 0;
}
用于编译的命令:gcc -m32 -g test.c -o test
Command used to compile: gcc -m32 -g test.c -o test
在检查gdb中的功能时,我得到了:
When examining the functions in gdb I get this:
(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
0x080483f7 <+0>: lea ecx,[esp+0x4]
0x080483fb <+4>: and esp,0xfffffff0
0x080483fe <+7>: push DWORD PTR [ecx-0x4]
0x08048401 <+10>: push ebp
0x08048402 <+11>: mov ebp,esp
0x08048404 <+13>: push ecx
0x08048405 <+14>: sub esp,0x14
0x08048408 <+17>: mov DWORD PTR [ebp-0xc],0x5
0x0804840f <+24>: push 0xf00d
0x08048414 <+29>: push DWORD PTR [ebp-0xc]
0x08048417 <+32>: call 0x80483eb <foo>
0x0804841c <+37>: add esp,0x8
0x0804841f <+40>: mov DWORD PTR [ebp-0x10],eax
0x08048422 <+43>: sub esp,0x8
0x08048425 <+46>: push DWORD PTR [ebp-0x10]
0x08048428 <+49>: push 0x80484d0
0x0804842d <+54>: call 0x80482c0 <printf@plt>
0x08048432 <+59>: add esp,0x10
0x08048435 <+62>: mov eax,0x0
0x0804843a <+67>: mov ecx,DWORD PTR [ebp-0x4]
0x0804843d <+70>: leave
0x0804843e <+71>: lea esp,[ecx-0x4]
0x08048441 <+74>: ret
End of assembler dump.
(gdb) disas foo
Dump of assembler code for function foo:
0x080483eb <+0>: push ebp
0x080483ec <+1>: mov ebp,esp
0x080483ee <+3>: mov eax,DWORD PTR [ebp+0x8]
0x080483f1 <+6>: imul eax,DWORD PTR [ebp+0xc]
0x080483f5 <+10>: pop ebp
0x080483f6 <+11>: ret
End of assembler dump.
让我感到困惑的部分是它试图对堆栈进行处理.据我了解,这就是它的作用:
The part that confuses me is what it is trying to do with the stack.From my understanding this is what it does:
首先,它引用了堆栈中高出4个字节的某个内存地址,据我所知,这应该是传递给main的变量,因为esp当前指向内存中的返回地址.
First it takes a reference to some memory address 4 bytes higher in the stack which from my knowledge should be the variables passed to main since esp currently pointed to the return address in memory.
其次,出于性能原因,它将堆栈对齐到0边界.
Second it aligns the stack to a 0 boundary for performance reasons.
第三次将其压入新的堆栈区域ecx + 4,这将转换为将我们假定要返回的地址压入堆栈.
Third it pushes onto the new stack area ecx+4 which should translate to pushing the address we are suppose to be returning to on the stack.
第四,它将旧的框架指针推入堆栈并设置新的指针.
Fourth it pushes the old frame pointer onto the stack and sets up the new one.
第五,它将ecx(仍指向它应该是main的参数)压入堆栈.
Fifth it pushes ecx (which is still pointing to would should be an argument to main) onto the stack.
程序将执行应做的事情,并开始返回过程.
The the program does what it should and begins the process of returning.
首先,它通过在ebp上使用-0x4偏移量来恢复ecx,该偏移量应访问第一个局部变量.
First it restores ecx by using a -0x4 offset on ebp which should access the first local variable.
其次执行离开指令,该指令实际上只是将esp设置为ebp,然后从堆栈中弹出ebp.
Second it executes the leave instruction which really just sets esp to ebp and then pops ebp from the stack.
所以现在堆栈上的下一个内容是返回地址,并且esp和ebp寄存器应该回到它们需要返回的状态了吗?
So now the next thing on the stack is the return address and the esp and ebp registers should be back to what they need to be to return right?
显然不是因为接下来要做的是用ecx-0x4加载esp,由于ecx仍指向传递给main的变量,因此应该将其放在堆栈上的返回地址地址.
Well evidently not because the next thing it does is load esp with ecx-0x4 which since ecx is still pointing to that variable passed to main should put it at the address of return address on the stack.
这工作得很好,但是提出了一个问题,为什么在第3步中就将返回地址放到堆栈上,因为它在实际上从函数返回之前将堆栈返回到末尾的原始位置.
This works just fine but raises the question of why it bothered putting the return address onto the stack in step 3 since it returned the stack to the original position at the end just before actually returning from the function.
推荐答案
更新:gcc8至少在正常用例(-fomit-frame-pointer
,并且没有需要可变大小分配的alloca
或C99 VLA)中简化了此操作.可能是由于AVX使用量的增加导致更多功能需要32字节对齐的本地或数组而引起的.
Update: gcc8 simplifies this at least for normal use-cases (-fomit-frame-pointer
, and no alloca
or C99 VLAs that require variable-size allocation). Perhaps motivated by increasing usage of AVX leading to more functions wanting a 32-byte aligned local or array.
此外,可能是
如果仅运行几次(例如,在32位代码中main
的开头),则此复杂的序言就可以了,但是它越多,似乎越值得优化. GCC有时仍会在所有将大于16字节的对齐对象优化到寄存器的函数中对堆栈进行过度对齐,这虽然已经错过了优化,但是当堆栈对齐更便宜时,这种情况就不那么糟糕了.
This complicated prologue is fine if it only ever runs a couple times (e.g. at the start of main
in 32-bit code), but the more it appears the more worthwhile it is to optimize it. GCC sometimes still over-aligns the stack in functions where all >16-byte aligned objects are optimized into registers, which is a missed optimization already but less bad when the stack alignment is cheaper.
gcc在对齐函数中的堆栈时也会生成一些笨拙的代码.我有一个可能的理论(见下文),为什么gcc可能会将返回地址复制到保存ebp
的上方以制作堆栈框架(是的,我同意gcc在做什么) ).在该函数中看起来没有必要,并且clang不会做任何类似的事情.
gcc makes some clunky code when aligning the stack within a function, even with optimization enabled. I have a possible theory (see below) on why gcc might be copying the return address to just above where it saves ebp
to make a stack frame (and yes, I agree that's what gcc is doing). It doesn't look necessary in this function, and clang doesn't do anything like that.
此外,ecx
的废话可能只是gcc没有优化掉其对齐堆栈样板中不需要的部分. (esp
的预对齐值是在堆栈上引用args所必需的,因此将第一个可能是arg的地址放入寄存器中是有意义的.)
Besides that, the nonsense with ecx
is probably just gcc not optimizing away unneeded parts of its align-the-stack boilerplate. (The pre-alignment value of esp
is needed to reference args on the stack, so it makes sense that it puts the address of the first would-be arg into a register).
使用32位代码进行优化后,您会看到相同的东西(其中gcc生成的main
不会假定16B堆栈对齐,即使当前版本的ABI要求在进程启动,调用main
的CRT代码要么对齐堆栈本身,要么保留内核提供的初始对齐(我忘记了).您还会在将堆栈对齐到大于16B的函数中看到这一点(例如,使用__m256
类型的函数,有时即使它们从未溢出到栈中.或者带有C ++ 11 alignas(32)
声明的数组的函数,或任何其他要求对齐的方式.)在64位代码中,gcc似乎总是为此使用r10
而不是rcx
.
You see the same thing with optimization in 32-bit code (where gcc makes a main
that doesn't assume 16B stack alignment, even though the current version of the ABI requires that at process startup, and the CRT code that calls main
either aligns the stack itself or preserves the initial alignment provided by the kernel, I forget). You also see this in functions that align the stack to more than 16B (e.g. functions that use __m256
types, sometimes even if they never spill them to the stack. Or functions with an array declared with C++11 alignas(32)
, or any other way of requesting alignment.) In 64-bit code, gcc always seems to use r10
for this, not rcx
.
gcc的执行方式不需要ABI合规性,因为clang的操作要简单得多.
There's nothing required for ABI compliance about the way gcc does it, because clang does something much simpler.
我添加了一个对齐的变量(使用volatile
作为一种简单的方法,可以强制编译器在堆栈上为其实际保留对齐的空间,而不是对其进行优化).我把您的代码 rel =" nofollow noreferrer>,以使用-O3
查看asm.我在gcc 4.9、5.3和6.1中看到了相同的行为,但是在clang中却看到了不同的行为.
int main(){
__attribute__((aligned(32))) volatile int v = 1;
return 0;
}
Clang3.8的-O3 -m32
输出在功能上与其-m64
输出相同.请注意,-O3
启用-fomit-frame-pointer
,但是某些功能仍然会生成堆栈帧.
push ebp
mov ebp, esp # make a stack frame *before* aligning, so ebp-relative addressing can only access stack args, not aligned locals.
and esp, -32
sub esp, 32 # esp is 32B aligned with 32 or 48B above esp reserved (depending on incoming alignment)
mov dword ptr [esp], 1 # store v
xor eax, eax # return 0
mov esp, ebp # leave
pop ebp
ret
gcc的输出在-m32
和-m64
之间几乎相同,但是将v
放在-m64标签为'red-zone'"rel =" tag> red-zone 的问题,因此-m32
输出有两个额外的说明:
gcc's output is nearly the same between -m32
and -m64
, but it puts v
in the red-zone with -m64
so the -m32
output has two extra instructions:
# gcc 6.1 -m32 -O3 -fverbose-asm. Most of gcc's comment lines are empty. I guess that means it has no idea why it's emitting those insns :P
lea ecx, [esp+4] #, get a pointer to where the first arg would be
and esp, -32 #, align
xor eax, eax # return 0
push DWORD PTR [ecx-4] # No clue WTF this is for; this looks batshit insane, but happens even in 64bit mode.
push ebp # make a stackframe, even though -fomit-frame-pointer is on by default and we can already restore the original esp from ecx (unlike clang)
mov ebp, esp #,
push ecx # save the old esp value (even though this function doesn't clobber ecx...)
sub esp, 52 #, reserve space for v (not present with -m64)
mov DWORD PTR [ebp-56], 1 # v,
add esp, 52 #, unreserve (not present with -m64)
pop ecx # restore ecx (even though nothing clobbered it)
pop ebp # at least it knows it can just pop instead of `leave`
lea esp, [ecx-4] #, restore pre-alignment esp
ret
似乎gcc想要在对齐堆栈之后使它的堆栈框架(使用push ebp
) .我想这很有意义,因此它可以引用相对于ebp
的本地语言.否则,如果要对齐本地人,则必须使用esp
相对寻址.
It seems that gcc wants to make its stack frame (with push ebp
) after aligning the stack. I guess that makes sense, so it can reference locals relative to ebp
. Otherwise it would have to use esp
-relative addressing, if it wanted aligned locals.
对齐后但按ebp
之前,返回地址的额外副本表示将返回地址复制到相对于保存的ebp
值的预期位置(以及将调用子函数时位于ebp
中.因此,这可以通过跟踪堆栈框架的链接列表并查看返回地址以找出涉及的功能,从而帮助希望放松堆栈的代码.
The extra copy of the return address after aligning but before pushing ebp
means that the return address is copied to the expected place relative to the saved ebp
value (and the value that will be in ebp
when child functions are called). So this does potentially help code that wants to unwind the stack by following the linked list of stack frames, and looking at return-addresses to find out what function is involved.
我不确定这是否与现代堆栈展开信息有关,该信息允许使用-fomit-frame-pointer
进行堆栈展开(回溯/异常处理). (这是.eh_frame
部分中的元数据.这是围绕esp
的每次修改的.cfi_*
指令的目的.)我应该看看clang在非叶函数中必须对齐堆栈时所执行的操作.
I'm not sure whether this matters with modern stack-unwind info that allows stack-unwinding (backtraces / exception handling) with -fomit-frame-pointer
. (It's metadata in the .eh_frame
section. This is what the .cfi_*
directives around every modification to esp
are for.) I should look at what clang does when it has to align the stack in a non-leaf function.
在函数内部需要esp
的原始值来引用堆栈上的函数args.我认为gcc不知道如何优化其align-the-stack方法中不需要的部分. (例如,main
不会查看其args(并且声明不接受任何参数))
The original value of esp
would be needed inside the function to reference function args on the stack. I think gcc doesn't know how to optimize away unneeded parts of its align-the-stack method. (e.g. out main
doesn't look at its args (and is declared not to take any))
这种代码生成是您在需要对齐堆栈的函数中看到的典型代码;由于使用volatile
自动存储功能,这并不奇怪.
This kind of code-gen is typical of what you see in a function that needs to align the stack; it's not extra weird because of using a volatile
with automatic storage.
这篇关于为什么gcc会产生额外的寄信人地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!