在C ++ 11中是否有任何等于asm(":::" memory“)的编译器障碍?

我猜你的意思是，当您查看x86的asm输出时，它没有发出任何屏障指令.诸如x86的MFENCE之类的指令不是编译器壁垒"，它们是运行时内存壁垒，甚至可以防止在运行时对StoreLoad进行重新排序. (这是x86允许的唯一重新排序.只有在使用弱排序(NT)商店(例如 MOVNTPS(_mm_stream_ps).)在像ARM这样的弱排序ISA上，thread_fence(mo_acq_rel)不是免费的，并且会编译为一条指令. gcc5.4使用dmb ish. (在 Godbolt编译器浏览器).编译器障碍仅阻止编译时重新排序，而不必阻止运行时重新排序.因此，即使在ARM上，atomic_signal_fence(mo_seq_cst)也不会编译为任何指令.一个足够弱的屏障允许编译器在需要时先将其存储到B，然后再存储至A，但是gcc碰巧决定仍然以源顺序执行它们，即使使用thread_fence(mo_acquire)(不应与其他商店一起订购商店.因此，此示例并未真正测试某些东西是否构成编译器障碍. 与gcc相比，gcc奇怪的编译器行为与编译器障碍不同: 在Godbolt上查看此source + asm .#include <atomic>using namespace std;int A,B;void foo() { A = 0; atomic_thread_fence(memory_order_release); B = 1; //asm volatile(""::: "memory"); //atomic_signal_fence(memory_order_release); atomic_thread_fence(memory_order_release); A = 2;}这将按您期望的方式用clang进行编译:thread_fence是StoreStore的障碍，因此A = 0必须在B = 1之前发生，并且不能与A = 2合并. # clang3.9 -O3 mov dword ptr [rip + A], 0 mov dword ptr [rip + B], 1 mov dword ptr [rip + A], 2 ret但是对于gcc而言，屏障没有作用，并且asm输出中仅存在A的最终存储. # gcc6.2 -O3 mov DWORD PTR B[rip], 1 mov DWORD PTR A[rip], 2 ret但使用atomic_signal_fence(memory_order_release)时，gcc的输出与clang匹配. 所以atomic_signal_fence(mo_release)具有我们期望的屏障作用，但是atomic_thread_fence具有比seq_cst弱的任何东西根本不充当编译器屏障.这里的一种理论是，gcc知道多个线程写入非atomic<>变量是正式的未定义行为.这并没有多大用处，因为atomic_thread_fence如果用于与信号处理程序进行同步仍然可以工作，但它比必要的功能更强大.顺便说一句，用atomic_thread_fence(memory_order_seq_cst)，我们得到了预期的 # gcc6.2 -O3, with a mo_seq_cst barrier mov DWORD PTR A[rip], 0 mov DWORD PTR B[rip], 1 mfence mov DWORD PTR A[rip], 2 ret即使只有一个障碍，我们也能做到这一点，这仍然允许A = 0和A = 2的存储一个接一个地发生，因此允许编译器跨一个障碍合并它们. (观察者可能看不到单独的A = 0和A = 2值，这是可能的顺序，因此编译器可以确定总是发生这种情况).不过，当前的编译器通常不会进行这种优化.在 num ++是否可以用于原子'int num'?.My test code is as below, and I found that only the memory_order_seq_cst forbade compiler's reorder.#include <atomic>using namespace std;int A, B = 1;void func(void) { A = B + 1; atomic_thread_fence(memory_order_seq_cst); B = 0;}And other choices such as memory_order_release, memory_order_acq_rel did not generate any compiler barrier at all.I think they must work with atomic variable just as below.#include <atomic>using namespace std;atomic<int> A(0);int B = 1;void func(void) { A.store(B+1, memory_order_release); B = 0;}But I do not want to use atomic variable. At the same time, I think the "asm("":::"memory")" is too low level.Is there any better choice? 解决方案 re: your edit:Why not? If it's for performance reasons, use them with memory_order_relaxed and atomic_signal_fence(mo_whatever) to block compiler reordering without any runtime overhead other than the compiler barrier potentially blocking some compile-time optimizations, depending on the surrounding code.If it's for some other reason, then maybe atomic_signal_fence will give you code that happens to work on your target platform. I suspect that it does order non-atomic<> loads and/or stores, so it might even help avoid data-race Undefined Behaviour in C++.Sufficient for what?Regardless of any barriers, if two threads run this function at the same time, your program has Undefined Behaviour because of concurrent access to non-atomic<> variables. So the only way this code can be useful is if you're talking about synchronizing with a signal handler that runs in the same thread.That would also be consistent with asking for a "compiler barrier", to only prevent reordering at compile time, because out-of-order execution and memory reordering always preserve the behaviour of a single thread. So you never need extra barrier instructions to make sure you see your own operations in program order, you just need to stop the compiler reordering stuff at compile time. See Jeff Preshing's post: Memory Ordering at Compile TimeThis is what atomic_signal_fence is for. You can use it with any std::memory_order, just like thread_fence, to get different strengths of barrier and only prevent the optimizations you need to prevent.Totally wrong, in several ways.atomic_thread_fence is a compiler barrier plus whatever run-time barriers are necessary to restrict reordering in the order our loads/stores become visible to other threads.I'm guessing you mean it didn't emit any barrier instructions when you looked at the asm output for x86. Instructions like x86's MFENCE are not "compiler barriers", they're run-time memory barriers and prevent even StoreLoad reordering at run-time. (That's the only reordering that x86 allows. SFENCE and LFENCE are only needed when using weakly-ordered (NT) stores, like MOVNTPS (_mm_stream_ps).)On a weakly-ordered ISA like ARM, thread_fence(mo_acq_rel) isn't free, and compiles to an instruction. gcc5.4 uses dmb ish. (See it on the Godbolt compiler explorer).A compiler barrier just prevents reordering at compile time, without necessarily preventing run-time reordering. So even on ARM, atomic_signal_fence(mo_seq_cst) compiles to no instructions.A weak enough barrier allows the compiler to do the store to B ahead of the store to A if it wants, but gcc happens to decide to still do them in source order even with thread_fence(mo_acquire) (which shouldn't order stores with other stores).So this example doesn't really test whether something is a compiler barrier or not.Strange compiler behaviour from gcc for an example that is different with a compiler barrier:See this source+asm on Godbolt.#include <atomic>using namespace std;int A,B;void foo() { A = 0; atomic_thread_fence(memory_order_release); B = 1; //asm volatile(""::: "memory"); //atomic_signal_fence(memory_order_release); atomic_thread_fence(memory_order_release); A = 2;}This compiles with clang the way you'd expect: the thread_fence is a StoreStore barrier, so the A=0 has to happen before B=1, and can't be merged with the A=2. # clang3.9 -O3 mov dword ptr [rip + A], 0 mov dword ptr [rip + B], 1 mov dword ptr [rip + A], 2 retBut with gcc, the barrier has no effect, and only the final store to A is present in the asm output. # gcc6.2 -O3 mov DWORD PTR B[rip], 1 mov DWORD PTR A[rip], 2 retBut with atomic_signal_fence(memory_order_release), gcc's output matches clang. So atomic_signal_fence(mo_release) is having the barrier effect we expect, but atomic_thread_fence with anything weaker than seq_cst isn't acting as a compiler barrier at all.One theory here is that gcc knows that it's officially Undefined Behaviour for multiple threads to write to non-atomic<> variables. This doesn't hold much water, because atomic_thread_fence should still work if used to synchronize with a signal handler, it's just stronger than necessary.BTW, with atomic_thread_fence(memory_order_seq_cst), we get the expected # gcc6.2 -O3, with a mo_seq_cst barrier mov DWORD PTR A[rip], 0 mov DWORD PTR B[rip], 1 mfence mov DWORD PTR A[rip], 2 retWe get this even with only one barrier, which would still allow the A=0 and A=2 stores to happen one after the other, so the compiler is allowed to merge them across a barrier. (Observers failing to see separate A=0 and A=2 values is a possible ordering, so the compiler can decide that's what always happens). Current compilers don't usually do this kind of optimization, though. See discussion at the end of my answer on Can num++ be atomic for 'int num'?. 这篇关于在C ++ 11中是否有任何等于asm(":::" memory“)的编译器障碍?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！ 1403页，肝出来的..