问题描述
我有一条用Intel语法编写的指令(使用gas作为汇编程序),如下所示:
I have an instruction written in Intel syntax (using gas as my assembler) that looks like this:
mov rdx, msg_size
...
msg: .ascii "Hello, world!\n"
.set msg_size, . - msg
,但是该mov指令正在汇编为mov 0xe,%rdx
,而不是我期望的mov $0xe,%rdx
.我应该如何编写第一条指令(或msg_size
的定义)以获得预期的行为?
but that mov instruction is being assembled to mov 0xe,%rdx
, rather than mov $0xe,%rdx
, as I would expect. How should I write the first instruction (or the definition of msg_size
) to get the expected behavior?
推荐答案
使用mov edx, OFFSET symbol
获取符号"address"作为立即数,而不是从地址中加载作为地址.这适用于实际的标签地址以及使用.set
设置为整数的符号.
Use mov edx, OFFSET symbol
to get the symbol "address" as an immediate, rather than loading from it as an address. This works for actual label addresses as well as symbols you set to an integer with .set
.
对于64位代码中的msg
地址(不是msg_size
汇编时间常数),您可能需要
lea rdx, [RIP+msg]
用于静态地址不适合32位的PIE可执行文件. 如何加载函数地址或将标签贴到GNU汇编器中注册
For the msg
address (not msg_size
assemble-time constant) in 64-bit code, you may wantlea rdx, [RIP+msg]
for a PIE executable where static addresses don't fit in 32 bits. How to load address of function or label into register in GNU Assembler
在GAS .intel_syntax noprefix
模式下:
-
OFFSET symbol
的工作方式类似于AT& T$symbol
.这有点像MASM. -
symbol
的工作原理类似于AT& Tsymbol
(即取消引用),用于未知符号. -
[symbol]
在GAS和NASM/YASM中始终是有效地址,而不是立即数.LEA
不会从地址加载,但仍使用内存操作数机器编码. (这就是lea使用相同语法的原因 ).
OFFSET symbol
works like AT&T$symbol
. This is somewhat like MASM.symbol
works like AT&Tsymbol
(i.e. a dereference) for unknown symbols.[symbol]
is always an effective-address, never an immediate, in GAS and NASM/YASM.LEA
doesn't load from the address but it still uses the memory-operand machine encoding. (That's why lea uses the same syntax).
GAS是一次通过的汇编程序(返回并填写符号值一经知道就可以.)
GAS is a one-pass assembler (which goes back and fills insymbol values once they're known).
当它第一次遇到该行时,它决定mov rdx, symbol
的操作码和编码. 更早版本 msize= . - msg
或.equ
/.set
将使它选择mov reg, imm32
,但更高版本的指令尚不可见.
It decides on the opcode and encoding for mov rdx, symbol
when it first encounters that line. An earlier msize= . - msg
or .equ
/ .set
will make it choose mov reg, imm32
, but a later directive won't be visible yet.
尚未定义符号的默认假设是symbol
是某个部分中的地址(例如,您可以使用symbol:
等标签或.set symbol, .
进行定义).而且因为GAS .intel_syntax
就像MASM而不是NASM,所以将裸符号视为[symbol]
-内存操作数.
The default assumption for not-yet-defined symbols is that symbol
is an address in some section (like you get from defining it with a label like symbol:
, or from .set symbol, .
). And because GAS .intel_syntax
is like MASM not NASM, a bare symbol is treated like [symbol]
- a memory operand.
如果将.set
或msg_length=msg_end - msg
指令放在文件顶部,则在引用该指令的指令之前,它们将汇编为mov reg, imm32
mov-immediate. (与AT& T语法不同,对于1234
这样的数字文字,始终总是需要$
作为立即数.)
If you put a .set
or msg_length=msg_end - msg
directive at the top of your file, before the instructions that reference it, they would assemble to mov reg, imm32
mov-immediate. (Unlike in AT&T syntax where you always need a $
for an immediate even for numeric literals like 1234
.)
例如:源代码和反汇编与objdump -dS
交错:
用gcc -g -c foo.s
组装并用objdump -drwC -S -Mintel foo.o
分解(as --version
= GNU汇编器(GNU Binutils)2.34).我们得到这个:
For example: source and disassembly interleaved with objdump -dS
:
Assembled with gcc -g -c foo.s
and disassembled with objdump -drwC -S -Mintel foo.o
(with as --version
= GNU assembler (GNU Binutils) 2.34). We get this:
0000000000000000 <l1>:
.intel_syntax noprefix
l1:
mov eax, OFFSET equsym
0: b8 01 00 00 00 mov eax,0x1
mov eax, equsym #### treated as a load
5: 8b 04 25 01 00 00 00 mov eax,DWORD PTR ds:0x1
mov rax, big #### 32-bit sign-extended absolute load address, even though the constant was unsigned positive
c: 48 8b 04 25 aa aa aa aa mov rax,QWORD PTR ds:0xffffffffaaaaaaaa
mov rdi, OFFSET label
14: 48 c7 c7 00 00 00 00 mov rdi,0x0 17: R_X86_64_32S .text+0x1b
000000000000001b <label>:
label:
nop
1b: 90 nop
.equ equsym, . - label # equsym = 1
big = 0xaaaaaaaa
mov eax, OFFSET equsym
1c: b8 01 00 00 00 mov eax,0x1
mov eax, equsym #### treated as an immediate
21: b8 01 00 00 00 mov eax,0x1
mov rax, big #### constant doesn't fit in 32-bit sign extended, assembler can see it when picking encoding so it picks movabs imm64
26: 48 b8 aa aa aa aa 00 00 00 00 movabs rax,0xaaaaaaaa
使用mov edx, OFFSET msg_size
将任何符号(甚至是数字文字)视为立即数始终是安全的,无论其定义如何.因此,它与AT& T $
完全相同,不同之处在于,当GAS已经知道符号值只是一个数字,而不是某个部分的地址时,它是可选的. 为了保持一致,最好始终使用OFFSET msg_size
,这样您的代码就不会改变含义,如果将来有一些程序员在移动代码,那么数据部分和相关的指令就不再是第一个了. (包括忘记了这些奇怪细节的未来您,这些细节与大多数汇编程序不同.)
It's always safe to use mov edx, OFFSET msg_size
to treat any symbol (or even a numeric literal) as an immediate regardless of how it was defined. So it's exactly like AT&T $
except that it's optional when GAS already knows the symbol value is just a number, not an address in some section. For consistency it's probably a good idea to always use OFFSET msg_size
so your code doesn't change meaning if some future programmer moves code around so the data section and related directives are no longer first. (Including future you who's forgotten these strange details that are unlike most assemblers.)
顺便说一句, .set
是同义词对于 .equ
,还有 symbol=value
语法,用于设置也与.set
同义的值.
BTW, .set
is a synonym for .equ
, and there's also symbol=value
syntax for setting a value which is also synonymous to .set
.
mov rdx, OFFSET symbol
将组装为mov r/m64, sign_extended_imm32
.除非它是一个负常数,而不是地址,否则您不希望它有很短的长度(大大小于4GiB).您也不想输入movabs r64, imm64
的地址;效率低下.
mov rdx, OFFSET symbol
will assemble to mov r/m64, sign_extended_imm32
. You don't want that for a small length (vastly less than 4GiB) unless it's a negative constant, not an address. You also don't want movabs r64, imm64
for addresses; that's inefficient.
在GNU/Linux下,在位置相关的可执行文件中编写mov edx, OFFSET symbol
是安全的,实际上,您应该始终这样做或使用lea rdx, [rip + symbol]
,除非您正在编写代码,否则切勿对32位立即数进行符号扩展.将被加载到高2GB的虚拟地址空间(例如内核)中. 如何加载函数地址或将标签贴到GNU汇编器中注册
It's safe under GNU/Linux to write mov edx, OFFSET symbol
in a position-dependent executable, and in fact you should always do that or use lea rdx, [rip + symbol]
, never sign-extended 32-bit immediate unless you're writing code that will be loaded into the high 2GB of virtual address space (e.g. a kernel). How to load address of function or label into register in GNU Assembler
另请参见 32位绝对地址不再在x86-64 Linux中允许使用?,以了解更多有关PIE可执行文件是现代发行版中默认的文件的信息.
See also 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables being the default in modern distros.
提示:如果您知道AT& T或NASM语法或NASM语法,则可以使用该代码生成所需的编码,然后使用objdump -Mintel
进行反汇编以找到适用于.intel_syntax noprefx
的正确语法.
Tip: if you know the AT&T or NASM syntax, or the NASM syntax, for something, use that to produce the encoding you want and then disassemble with objdump -Mintel
to find out the right syntax for .intel_syntax noprefx
.
但这在这里无济于事,因为反汇编只会显示数字文字,如mov edx, 123
,而不是mov edx, OFFSET name_not_in_object_file
.查看gcc -masm=intel
编译器的输出也有帮助,但是再次,编译器将执行自己的常量传播,而不是将符号用作汇编时常量.
But that doesn't help here because disassembly will just show the numeric literal like mov edx, 123
, not mov edx, OFFSET name_not_in_object_file
. Looking at gcc -masm=intel
compiler output can also help, but again compilers do their own constant-propagation instead of using symbols for assemble-time constants.
顺便说一句,据我所知,没有开源项目包含GAS intel_syntax源代码.如果使用汽油,则使用AT& T语法.否则,他们使用NASM/YASM. (有时您还会在开放源代码项目中看到MSVC内联汇编.)
BTW, no open-source projects that I'm aware of contain GAS intel_syntax source code. If they use gas, they use AT&T syntax. Otherwise they use NASM/YASM. (You sometimes also see MSVC inline asm in open source projects).
这篇关于将内存与GNU中的常量区别为.intel_syntax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!