问题描述
我最近一直在尝试通过使用缓冲区和不同汇编运算符的 RAW 十六进制等效项在 C++ 中实现动态函数.举例说明一个简单的跳转:
I've recently been trying to implement dynamic functions in C++ by using a buffer and RAW hexadecimal equivalents of different assembly operators. To illustrate a simple jump:
byte * buffer = new buffer[5];
*buffer = '0xE9'; // Hex for jump
*(uint*)(buffer + 1) = 'address destination';
我在汇编方面没有经验,但我知道足以创建非常简单的函数.现在我正在原始内存中创建 cdecl 函数.问题是,我不知道我想用 sub
推送多少堆栈(用于内存).我们以这个函数为例:
I am not experienced in assembly but I know enough to create very simple functions. Right now I'm creating cdecl functions in raw memory. The problem is, I do not know how much I want to push the stack (for memory) with sub
. Let's take this function as an example:
int MyTest(int x, int y) { return x + y; }
long TheTest(int x, int y)
{
return MyTest(x, 5);
}
08048a20 <_Z6TheTestii>:
_Z6TheTestii():
8048a20: 55 push %ebp
8048a21: 89 e5 mov %esp,%ebp
8048a23: 83 ec 18 sub $0x18,%esp
8048a26: c7 44 24 04 05 00 00 movl $0x5,0x4(%esp)
8048a2d: 00
8048a2e: 8b 45 08 mov 0x8(%ebp),%eax
8048a31: 89 04 24 mov %eax,(%esp)
8048a34: e8 c2 ff ff ff call 80489fb <_Z6MyTestii>
8048a39: c9 leave
8048a3a: c3 ret
如您所见,首先是 C++ 代码,下面是TheTest"函数的 ASM.人们会立即注意到堆栈被推送了 24 (0x18) 个字节(如前所述,我没有使用汇编的经验,所以我可能不会使用正确的术语和/或完全正确).这对我来说没有任何意义.为什么只使用 2 个不同的整数时需要 24 个字节?使用了变量x",它是 4 个字节,而值5"也使用了 4 个字节(记住它是 cdecl,所以调用函数会处理与函数 arguments 相关的内存)弥补24....
As you can see, first is the C++ code and below is the ASM of the 'TheTest' function. One can instantly notice that the stack is pushed for 24 (0x18) bytes (as previously mentioned, I am not experienced using assembly so I might not use the correct terms and/or be completely right). This does not make any sense for me. How come 24 bytes is required when only 2 different integers are used? The variable 'x' is used, which is 4 bytes, and the value '5' which also uses 4 bytes (remember it's cdecl so the calling function takes care of memory regarding the function arguments) does not make up for 24....
现在这是一个额外的例子,它让我真的对汇编输出感到疑惑:
Now here is an additional example which makes me really wonder about the assembly output:
int NewTest(int x, char val) { return x + val; }
long TheTest(int x, int y)
{
return NewTest(x, (char)6);
}
08048a3d <_Z6TheTestiiii>:
_Z6TheTestiiii():
8048a3d: 55 push %ebp
8048a3e: 89 e5 mov %esp,%ebp
8048a40: 83 ec 08 sub $0x8,%esp
8048a43: c7 44 24 04 06 00 00 movl $0x6,0x4(%esp)
8048a4a: 00
8048a4b: 8b 45 08 mov 0x8(%ebp),%eax
8048a4e: 89 04 24 mov %eax,(%esp)
8048a51: e8 ca ff ff ff call 8048a20 <_Z7NewTestic>
8048a56: c9 leave
8048a57: c3 ret
这里唯一的区别(值除外)是我使用char"(1 个字节)而不是整数.如果我们随后查看汇编代码,这只会将堆栈指针压入 8 个字节.这与前面的示例相差 16 个字节.作为一个彻头彻尾的 C++ 人,我不知道发生了什么.如果有人能在这个问题上启发我,我将不胜感激!
The only difference here (except the values) is the fact that I use a 'char' (1 byte) instead of an integer. If we then look at the assembly code, this pushes the stack pointer for only 8 bytes. That's a difference of 16 bytes from the previous example. As an out-and-out C++ person, have I no clue what's going on. I would really appreciate if someone could enlighten me on the subject!
注意:我之所以在这里发帖而不是阅读 ASM 书籍,是因为我需要为这个 one 函数使用汇编.所以我不想为了40行代码读一整本书......
NOTE: The reason why I'm posting here instead of reading an ASM book, is because I need to use assembly for this one function. So I don't want to read a whole book for 40 lines of code...
我也不关心平台依赖性,我只关心 Linux 32 位 :)
I also do not care for platform-dependency, I only care about Linux 32bit :)
推荐答案
在 TheTest
中创建的堆栈框架包含本地(自动)变量和函数的参数,例如 MyTest
和 NewTest
,由 TheTest
调用.框架由 TheTest
推入和弹出,因此只要它足够大以容纳它调用的函数的参数,大小并不重要.
The stack frame created in TheTest
holds both local (automatic) variables and arguments to functions, such as MyTest
and NewTest
, called by TheTest
. The frame is pushed and popped by TheTest
, so as long as it is big enough to hold the arguments to the functions it calls, the size doesn't matter much.
您看到的编译器输出是编译器多次传递的结果.每次通过都可以执行减少所需帧大小的转换和优化;我怀疑在某些早期状态下,编译器需要 24 字节的帧,即使代码经过优化也从未减少过.
The compiler output you are seeing is the result of several passes of the compiler. Each pass may perform transformations and optimizations that reduce the frame size required; I suspect at some early state the compiler needed 24 bytes of frame, and never reduced it even though the code was optimized.
您平台上的编译器的 ABI 将建立一些您必须遵循的关于堆栈对齐的规则,因此帧大小四舍五入以满足这些要求.
The ABI of the compiler on your platform will establish some rules about stack alignment that you must follow, so frame sizes are rounded up to meet these requirements.
这些函数使用帧指针%ebp%
,尽管这在代码大小或性能方面没有优势;不过,这可能有助于调试.
These functions use the frame pointer %ebp%
though this is not a win in code size or performance; this may aid debugging, though.
这篇关于内联汇编 - cdecl 和准备堆栈的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!