问题描述
在WB内存中,a = b = 0
P1:
a = 1
SFENCE
b = 1
P2:
WHILE (b == 0) {}
LFENCE
ASSERT (a == 0)
据我了解,这里不需要 SFENCE
或 LFENCE
.
It is my understanding, that neither the SFENCE
or LFENCE
are needed here.
也就是说,对于这种内存类型,x86 确保:
Namely, since, for this memory type, x86 ensures:
- 读取不能与旧读取重新排序
- 商店不能与旧商店重新订购
- 商店可传递可见
推荐答案
lfence
和 sfence
asm 指令是无操作的,除非您使用 NT 存储(或NT 从 WC 内存加载,例如视频 RAM).(实际上,movntdqa
加载可能只能由纸上的 mfence
排序,而不是 lfence
.在这种情况下,我不知道你什么时候会使用 lfence
.在 movntdqa
之前,它与 sfence
+ mfence
一起添加到 ISA,同时作为 NT 存储,可能只是为了完整性/以防万一.)
The lfence
and sfence
asm instructions are no-ops unless you're using NT stores (or NT loads from WC memory, e.g. video RAM). (Actually, movntdqa
loads might only be ordered by mfence
on paper, not lfence
. In which case I don't know when you'd ever use lfence
. It was added to the ISA along with sfence
+ mfence
at the same time as NT stores, before movntdqa
, possibly just for completeness / in case it was ever needed.)
有时会混淆这一点,因为lfence
和 sfence
的 C/C++ 内在函数也是编译器障碍. 在 C/C++ 中是需要的,但可以用 GNU C asm("":::"memory");
或(订购轻松-原子操作) std::atomic_signal_fence(std::memory_order_acq_rel)
.限制编译时重新排序而无需使编译器发出任何无用的 asm 屏障指令.
There is sometimes confusion around this point, because the C/C++ intrinsics for lfence
and sfence
are also compiler barriers. That is needed in C/C++, but can be had more cheaply with GNU C asm("":::"memory");
or (to order relaxed-atomic
operations) std::atomic_signal_fence(std::memory_order_acq_rel)
. Restricts compile-time reordering without making the compiler emit any useless asm barrier instructions.
运行时重新排序已被 x86 内存模型阻止,除了 StoreLoad 重新排序,这需要 mfence
来阻止.lfence
+ sfence
不等于 mfence
.请参阅是否有任何意义指令LFENCE在 x86/x86_64 处理器中? 以及其他各种 SO Q&关于这些指令.
Run-time reordering is already blocked by the x86 memory model, except for StoreLoad reordering which requires mfence
to block. lfence
+ sfence
don't add up to mfence
. See Does it make any sense instruction LFENCE in processors x86/x86_64? and various other SO Q&As about these instructions.
这就是为什么 std::atomic_thread_fence(std::memory_order_acq_rel)
在 x86 上也编译为零指令,但在弱有序架构上编译为障碍.
This is why std::atomic_thread_fence(std::memory_order_acq_rel)
also compiles to zero instructions on x86, but to barriers on weakly-ordered architectures.
lfence
也是 Intel 微架构(但可能不是 AMD?)的序列化指令.一直以来都是如此,但英特尔最近正式做出了这一保证,因此 Spectre 缓解技术可以安全地使用它,而不是更加不方便的 cpuid
.
lfence
is also a serializing instruction on Intel microarchitectures (but maybe not AMD?). It has been all along, but Intel recently made this guarantee official so Spectre mitigation techniques could safely use it instead of a much more inconvenient cpuid
.
- 脚注 1:
atomic_signal_fence
也可能是纯非atomic
变量的编译器障碍;这是我最后一次检查 gcc(而 atomic_thread_fence
不是),但这可能只是一个实现细节,当不涉及任何 atomic
变量时.当有 atomic
变量时,编译器知道这些变量可能会提供排序,让其他线程在没有 UB 的情况下访问非原子变量,因此需要排序.
atomic_signal_fence
on gcc may also be a compiler barrier for plain non-atomic
variables; it was last time I checked with gcc (while atomic_thread_fence
wasn't), but this is probably just an implementation detail when there aren't any atomic
variables involved. When there are atomic
variables, the compiler knows that those variables may provide ordering that lets other threads access non-atomic variables without UB, so ordering is needed.
这篇关于x86:这里需要内存屏障吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!