xchg 如何在英特尔汇编语言中工作

本文介绍了xchg 如何在英特尔汇编语言中工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有人可以解释一下 xchg 在这段代码中是如何工作的吗?鉴于 arrayD 是一个 1,2,3 的 DWORD 数组.

can someone explain how does the xchg work in this code? Given that arrayD is an DWORD array of 1,2,3.

mov eax, arrayD ; eax=1
xchg eax, [arrayD+4]; eax=2 arrayD=2,1,3

为什么 xchg 后面的数组不是 1,1,3?

Why isn't the array 1,1,3 after the xchg?

推荐答案

xchg 作品就像英特尔的文档说的那样.

我认为第 2 行的评论是错误的.它应该是 eax=2, arrayD = 1,1,3. 所以你是对的，你应该给你的导师发邮件说您认为自己发现了错误，除非您在笔记中遗漏了某些内容.

I think the comment on the 2nd line is wrong. It should be eax=2, arrayD = 1,1,3. So you're correct, and you should email your instructor to say you think you've found a mistake, unless you missed something in your notes.

xchg 只存储一个元素，它不能神奇地及时回溯知道 eax 中的值来自哪里并用一个 xchg 交换两个内存位置说明.

xchg only stores one element, and it can't magically look back in time to know where the value in eax came from and swap two memory locations with one xchg instruction.

在一条指令中将 1,2 交换为 2,1 的唯一方法是 64 位旋转，例如 rol qword ptr [arrayD], 32(仅限 x86-64).

The only way to swap 1,2 to 2,1 in one instruction would be a 64-bit rotate, like rol qword ptr [arrayD], 32 (x86-64 only).

顺便说一句，如果您关心性能，请不要将 xchg 与内存操作数一起使用.它有一个隐含的 lock 前缀，所以它是一个完整的内存屏障，在 Haswell/Skylake 上需要大约 20 个 CPU 周期(http://agner.org/optimize/).当然，多个指令可以同时运行，但 xchg mem,reg 是 8 uop，而单独加载 + 存储总共有 2 个.xchg 不会停止管道，但是内存屏障会造成很大伤害，而且 CPU 需要做很多工作才能使其原子化.

BTW, don't use xchg with a memory operand if you care about performance. It has an implicit lock prefix, so it's a full memory barrier and takes about 20 CPU cycles on Haswell/Skylake (http://agner.org/optimize/). Of course, multiple instructions can be in flight at once, but xchg mem,reg is 8 uops, vs. 2 total for separate load + store. xchg doesn't stall the pipeline, but the memory barrier hurts a lot, as well as it just being a lot of work for the CPU to do to make it atomic.