问题描述
阅读此堆栈溢出答案,然后此文档,我仍然不喜欢不能理解movq
和movabsq
之间的区别.
After reading this stack overflow answer, and this document, I still don't understand the difference between movq
and movabsq
.
我目前的理解是,在movabsq
中,第一个操作数是64位立即数,而movq
符号扩展了32位立即数.从上面引用的第二份文档中:
My current understanding is that in movabsq
, the first operand is a 64-bit immediate operand whereas movq
sign-extends a 32-bit immediate operand. From the 2nd document referenced above:
在第一个参考文献中,彼得指出:
In the first reference, Peter states:
但是,当我汇编/运行此程序时,它似乎运行良好:
However, when I assemble/run this it seems to work fine:
.section .rodata
str:
.string "0x%lx\n"
.text
.globl main
main:
pushq %rbp
movq %rsp, %rbp
movl $str, %edi
movq $0xFFFFFFFF, %rsi
xorl %eax, %eax
call printf
xorl %eax, %eax
popq %rbp
ret
$ clang file.s -o file && ./file
打印0xffffffff
. (这对于较大的值也类似,例如,如果您添加一些额外的"F",则该方法也是如此). movabsq
生成相同的输出.
prints 0xffffffff
. (This works similarly for larger values, for instance if you throw in a few additional "F"s). movabsq
generates an identical output.
Clang可以推断出我想要什么吗?如果是,那么movabsq
仍然比movq
有好处吗?
Is Clang inferring what I want? If it is, is there still a benefit to movabsq
over movq
?
我错过了什么吗?
推荐答案
可以通过以下三种方式来填充64位寄存器:
There are three kind of moves to fill a 64-bit register:
-
移至低32位部分:
B8 +rd id
,5个字节
例如:mov eax, 241
/mov[l] $241, %eax
移至低32位部分会将高位部分归零.
Moving to the low 32-bit part:
B8 +rd id
, 5 bytes
Example:mov eax, 241
/mov[l] $241, %eax
Moving to the low 32-bit part will zero the upper part.
立即移动64位:48 B8 +rd io
,10个字节
例如:mov rax, 0xf1f1f1f1f1f1f1f1
/mov[abs][q] $0xf1f1f1f1f1f1f1f1, %rax
立即移动完整的64位.
Moving with a 64-bit immediate: 48 B8 +rd io
, 10 bytes
Example: mov rax, 0xf1f1f1f1f1f1f1f1
/ mov[abs][q] $0xf1f1f1f1f1f1f1f1, %rax
Moving a full 64-bit immediate.
以符号扩展的32位立即数进行移动:48 C7 /0 id
,7个字节
例如:mov rax, 0xffffffffffffffff
/mov[q] $0xffffffffffffffff, %rax
将带符号的32位立即数移到完整的64位寄存器中.
Moving with a sign-extended 32-bit immediate: 48 C7 /0 id
, 7 bytes
Example: mov rax, 0xffffffffffffffff
/ mov[q] $0xffffffffffffffff, %rax
Moving a signed 32-bit immediate to full 64-bit register.
请注意在组装级别如何,movq
用于第二种和第三种情况.
Notice how at the assembly level there is room for ambiguity, movq
is used for the second and third case.
对于每个立即数,我们有:
For each immediate value we have:
- (a) [0,0x7fff_ffff] 中的值可以用(1),(2)和(3)进行编码.
- (b) [0x8000_0000,0xffff_ffff] 中的值可以用(1)和(2)进行编码.
- (c) [0x1_0000_0000,0xffff_ffff_7fff_ffff] 中的值可以用(2)编码
- (d) [0xffff_ffff_8000_0000、0xffff_ffff_ffff_ffff] 中的值可以用(2)和(3)编码.
- (a) Values in [0, 0x7fff_ffff] can be encoded with (1), (2) and (3).
- (b) Values in [0x8000_0000, 0xffff_ffff] can be encoded with (1) and (2).
- (c) Values in [0x1_0000_0000, 0xffff_ffff_7fff_ffff] can be encoded with (2)
- (d) Values in [0xffff_ffff_8000_0000, 0xffff_ffff_ffff_ffff] can be encoded with (2) and (3).
除第三个情况外,所有情况至少都有两种可能的编码.
如果有多种编码可用,汇编程序通常会选择最短的一种,但并非总是如此.
All the cases but the third have at least two possible encoding.
The assembler picks up the shortest one usually if more than one encoding is available but that's not always the case.
对于GAS:movabs[q]
始终对应于(2).mov[q]
对于(a)和(d)情况对应于(3),对于其他情况对应.它不会为移动到64位寄存器而生成(1).
For GAS:movabs[q]
always correspond to (2).mov[q]
corresponds to (3) for the cases (a) and (d), to (2) for the other cases.
It never generate (1) for a move to a 64-bit register.
要使其变为(1),我们必须使用等价的mov[l] $0xffffffff, %edi
(我相信GAS不会将将64位寄存器的移动转换为低32位寄存器的移动,即使这是等效).
To make it pick up (1) we have to use mov[l] $0xffffffff, %edi
which is equivalent (I believe GAS won't convert a move to a 64-bit register to one to its lower 32-bit register even when this is equivalent).
在16/32位时代,区分(1)和(3)并不是很重要(但),因为它不是符号扩展操作,而是8086中原始编码的伪像.
In the 16/32-bit era distinguishing between (1) and (3) was not considered really important (yet in GAS it's possible to pick one specific form) since it was not a sign-extend operation but an artefact of the original encoding in the 8086.
mov
指令从未被分解为两种形式来说明(1)和(3),而是使用了一个mov
,汇编程序几乎总是在(3)上选择(1).
The mov
instruction was never split into two forms to account for (1) and (3), instead a single mov
was being used with the assembler almost always picking (1) over (3).
使用具有64位立即数的新64位寄存器会使代码过于稀疏(并且很容易违反当前的最大16字节最大指令长度),因此不值得将(1)扩展为总是64位立即数.
相反(1)仍然具有32位立即数和零扩展(以打破任何错误的数据依赖关系),而(2)则是在极少数实际需要64位立即数操作数的情况下引入的.
借此机会,(3)也更改为 still 立即采用32位,但也对其进行符号扩展.
(1)和(3)应该足以满足最常见的立即数(例如1或-1).
With the new 64-bit registers having 64-bit immediates would make the code far too sparse (and would easily violate the current maximum instruction length of 16 bytes) so it was not worth it to extend (1) to always take 64-bit immediate.
Instead (1) still have 32-bit immediate and zero-extends (to break any false data dependency) and (2) was introduced for the rare case where a 64-bit immediate operand is actually needed.
Taking the chance, (3) was also changed to still take a 32-bit immediate but to also sign-extend it.
(1) and (3) should suffice for the most common immediates (like 1 or -1).
但是(1)/(3)和(2)之间的差异比(1)和(3)之间的过去差异要深,因为(1)和(3)都具有相同大小的操作数, 32位(3)具有64位立即数.
However the difference between (1)/(3) and (2) is deeper than the past difference between (1) and (3) because while (1) and (3) both have an operand of the same size, 32-bit, (3) has a 64-bit immediate operand.
为什么要人为地延长指令?
一个用例可能是填充,因此下一个循环为16/32字节的倍数.
这会牺牲前端的资源(指令高速缓存中更多的空间),而不是后端的资源(比不填充op指令要少的uOP).
Why would one want an artificially lengthened instruction?
One use case could be padding, so that the next loop is at a multiple of 16/32 bytes.
This sacrifice the resources at the front-end (more space in the instruction cache) for the ones in the back end (less uOPs than filling with no op instructions).
另一种且更常见的用例是仅需要生成机器代码模板的情况.
例如,在JIT中,可能要准备仅在运行时使用和填充立即数值的指令序列.
在这种情况下,使用(2)将大大简化处理,因为始终为所有可能的值提供足够的空间.
Another, and more frequent, use case is when one only need to generate a machine code template.
For example in a JIT one may want to prepare the sequence of instructions to use and fill the immediates values only at runtime.
In that case using (2) will greatly simplify the handling since there is always enough room for all the possible values.
另一种情况是某些修补功能,在软件的调试版本中,可以使用刚加载了(2)的寄存器中的地址间接调用特定的调用,以便调试器可以轻松地将调用劫持到任何新目标.
Another case is for some patching functionality, in a debug version of a software specific calls could be made indirectly with an address in a register that has just been loaded with (2) so that the debugger can hijack the call easily to any new target.
这篇关于x86-64 AT& T指令movq和movabsq有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!