正如 安东尼·威廉姆斯所说:

some_atomic.load(std::memory_order_acquire) 只是掉线到一个简单的加载指令,以及some_atomic.store(std::memory_order_release) 进入一个简单的存储指令.

众所周知,在 x86 上,load()store() 操作的内存屏障 memory_order_consume, memory_order_acquire, memory_order_release,memory_order_acq_rel不需要处理器指令.

但是在 ARMv8 我们知道这里有 load()store() 的内存屏障:http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2

About different architectures of CPUs: http://g.oswego.edu/dl/jmm/cookbook.html

接下来,对于 x86 上的 CAS 操作,这两行具有不同内存屏障的反汇编代码(MSVS2012 x86_64)是相同的:

    a.compare_exchange_weak(temp, 4, std::memory_order_seq_cst, std::memory_order_seq_cst);
000000013FE71A2D  mov         ebx,dword ptr [temp]
000000013FE71A31  mov         eax,ebx
000000013FE71A33  mov         ecx,4
000000013FE71A38  lock cmpxchg dword ptr [temp],ecx

    a.compare_exchange_weak(temp, 5, std::memory_order_relaxed, std::memory_order_relaxed);
000000013FE71A4D  mov         ecx,5
000000013FE71A52  mov         eax,ebx
000000013FE71A54  lock cmpxchg dword ptr [temp],ecx

GCC 4.8.1 x86_64 - GDB 编译的反汇编代码:

a.compare_exchange_weak(temp, 4, std::memory_order_seq_cst, std::memory_order_seq_cst);
a.compare_exchange_weak(temp, 5, std::memory_order_relaxed, std::memory_order_relaxed);

0x4613b7  <+0x0027>         mov    0x2c(%rsp),%eax
0x4613bb  <+0x002b>         mov    $0x4,%edx
0x4613c0  <+0x0030>         lock cmpxchg %edx,0x20(%rsp)
0x4613c6  <+0x0036>         mov    %eax,0x2c(%rsp)
0x4613ca  <+0x003a>         lock cmpxchg %edx,0x20(%rsp)

在 x86/x86_64 平台上进行任何原子 CAS 操作,例如这样的例子 atomic_val.compare_exchange_weak(temp, 1, std::memory_order_relaxed, std::memory_order_relaxed); 总是满意排序 std::memory_order_seq_cst?

如果 x86 上的任何 CAS 操作总是以顺序一致性(std::memory_order_seq_cst)运行,而不管障碍如何,那么在 ARMv8 上它是一样的吗?

问题:CASstd::memory_order_relaxed 顺序是否应该在 x86 或 ARM 上阻塞内存总线?

答案:x86 上,任何 compare_exchange_weak() 操作与任何 std::memory_orders(甚至 std::memory_order_relaxed) 总是转换为 LOCK CMPXCHG 带锁总线,真正具有原子性,并且与 XCHG - cmpxchgxchg 指令一样昂贵".

ANSWER: On x86 any compare_exchange_weak() operations with any std::memory_orders(even std::memory_order_relaxed) always translates to the LOCK CMPXCHG with lock bus, to be really atomic, and have equal expensive to XCHG - "the cmpxchg is just as expensive as the xchg instruction".


(An addition: XCHG equal to LOCK XCHG, but CMPXCHG doesn't equal to LOCK CMPXCHG(which is really atomic)

ARM 和 PowerPC 上,对于任何`compare_exchange_weak(),对于不同的 std::memory_orders,有不同的锁的处理器指令,通过 LL/SC.

On ARM and PowerPC for any`compare_exchange_weak() for different std::memory_orders there are differents lock's processor instructions, through LL/SC.

x86(CAS 除外)、ARM 和 PowerPC 的处理器内存屏障说明:http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

Processor memory-barriers-instructions for x86(except CAS), ARM and PowerPC: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html


您不必担心编译器将给定的 C11 构造映射到哪些指令,因为这不会捕获所有内容.相反,您需要根据 C11 内存模型的保证来开发代码.正如上面的注释所指出的,只要不违反 C11 内存模型,您的编译器或未来的编译器可以自由地重新排序宽松的内存操作.也值得通过 CDSChecker 之类的工具运行您的代码,以查看在内存模型下允许哪些行为.

You shouldn't worry about what instructions the compiler maps a given C11 construct to as this doesn't capture everything. Instead you need to develop code with respect to the guarantees of the C11 memory model. As the above comment notes, your compiler or future compilers are free to reorder relaxed memory operations as long as it doesn't violate the C11 memory model. It is also a worthwhile running your code through a tool like CDSChecker to see what behaviors are allowed under the memory model.

08-24 17:15