问题描述
有人可以解释一下 xchg 在这段代码中是如何工作的吗?鉴于 arrayD 是一个 1,2,3 的 DWORD 数组.
can someone explain how does the xchg work in this code? Given that arrayD is an DWORD array of 1,2,3.
mov eax, arrayD ; eax=1
xchg eax, [arrayD+4]; eax=2 arrayD=2,1,3
为什么 xchg 后面的数组不是 1,1,3?
Why isn't the array 1,1,3 after the xchg?
推荐答案
xchg
作品 就像英特尔的文档说的那样.
我认为第 2 行的评论是错误的.它应该是 eax=2
, arrayD = 1,1,3
. 所以你是对的,你应该给你的导师发邮件说您认为自己发现了错误,除非您在笔记中遗漏了某些内容.
I think the comment on the 2nd line is wrong. It should be eax=2
, arrayD = 1,1,3
. So you're correct, and you should email your instructor to say you think you've found a mistake, unless you missed something in your notes.
xchg
只存储一个元素,它不能神奇地及时回溯知道 eax 中的值来自哪里并用一个 xchg
交换两个内存位置说明.
xchg
only stores one element, and it can't magically look back in time to know where the value in eax came from and swap two memory locations with one xchg
instruction.
在一条指令中将 1,2
交换为 2,1
的唯一方法是 64 位旋转,例如 rol qword ptr [arrayD], 32
(仅限 x86-64).
The only way to swap 1,2
to 2,1
in one instruction would be a 64-bit rotate, like rol qword ptr [arrayD], 32
(x86-64 only).
顺便说一句,如果您关心性能,请不要将 xchg
与内存操作数一起使用.它有一个隐含的 lock
前缀,所以它是一个完整的内存屏障,在 Haswell/Skylake 上需要大约 20 个 CPU 周期(http://agner.org/optimize/).当然,多个指令可以同时运行,但 xchg mem,reg
是 8 uop,而单独加载 + 存储总共有 2 个.xchg
不会停止管道,但是内存屏障会造成很大伤害,而且 CPU 需要做很多工作才能使其原子化.
BTW, don't use xchg
with a memory operand if you care about performance. It has an implicit lock
prefix, so it's a full memory barrier and takes about 20 CPU cycles on Haswell/Skylake (http://agner.org/optimize/). Of course, multiple instructions can be in flight at once, but xchg mem,reg
is 8 uops, vs. 2 total for separate load + store. xchg
doesn't stall the pipeline, but the memory barrier hurts a lot, as well as it just being a lot of work for the CPU to do to make it atomic.
相关:
- 以 8086 汇编语言(16 位)交换 2 个寄存器(如何有效地将寄存器与内存交换).
xchg
仅在您需要原子性,或者您关心代码大小而不是速度时才对这种情况有用. - 对于int num",num++ 可以是原子的吗?
- 为什么XCHG reg, reg 是关于现代英特尔架构的 3 微操作指令吗?(对于 reg,reg 版本)
- swapping 2 registers in 8086 assembly language(16 bits) (how to efficiently swap a register with memory).
xchg
is only useful for this case if you need atomicity, or if you care about code-size but not speed. - Can num++ be atomic for 'int num'?
- Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? (for the reg,reg version)
这篇关于xchg 如何在英特尔汇编语言中工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!