问题描述
从先前对我们不能使用SFENCE
而不是MFENCE
来实现顺序一致性.
As we know from a previous answer to Does it make any sense instruction LFENCE in processors x86/x86_64? that we can not use SFENCE
instead of MFENCE
for Sequential Consistency.
那里的答案表明MFENCE
= SFENCE
+ LFENCE
,即LFENCE
会执行某些操作,否则我们将无法提供顺序一致性.
An answer there suggests that MFENCE
= SFENCE
+LFENCE
, i.e. that LFENCE
does something without which we can not provide Sequential Consistency.
LFENCE
使得无法重新排序:
SFENCE
LFENCE
MOV reg, [addr]
-到->
MOV reg, [addr]
SFENCE
LFENCE
例如,机制-存储缓冲区提供的MOV [addr], reg
LFENCE
-> LFENCE
MOV [addr], reg
的重新排序,它对存储-为提高性能而加载的负载进行重新排序,并且因为LFENCE
并不能阻止它.并且SFENCE
禁用此机制.
For example reordering of MOV [addr], reg
LFENCE
--> LFENCE
MOV [addr], reg
provided by mechanism - Store Buffer, which reorders Store - Loads for performance increase, and beacause LFENCE
does not prevent to it. And SFENCE
disables this mechanism.
什么机制禁止LFENCE
使其无法进行重新排序(x86没有机制-Invalidate-Queue)?
What mechanism disables the LFENCE
to make impossible reordering (x86 have not mechanism - Invalidate-Queue)?
SFENCE
MOV reg, [addr]
-> MOV reg, [addr]
SFENCE
的重新排序是否仅在理论上或实际上是可能的?如果可能的话,实际上是什么机制,它是如何运作的?
And is reordering of SFENCE
MOV reg, [addr]
--> MOV reg, [addr]
SFENCE
possible only in theory or perhaps in reality? And if possible, in reality, what mechanisms, how does it work?
推荐答案
-
MFENCE先清空存储缓冲区,然后再执行执行.
LFENCE先清空ROB,然后再向后端发出 指令.
LFENCE drains the ROB before later instructions can issue into the back-end.
SFENCE仅针对其他商店订购商店,即防止NT商店在SFENCE本身之前从商店缓冲区提交.但除此之外,SFENCE就像在存储缓冲区中移动的普通存储一样.可以将其想象为在杂货店结帐传送带上放置分隔线,以阻止NT商店提早被抢购.它不会不在退出之前强制清空存储缓冲区,因此将LFENCE放入之后不会累加到MFENCE.
SFENCE only orders stores against other stores, i.e. prevents NT stores from committing from the store buffer ahead of SFENCE itself. But otherwise SFENCE is just like a plain store that moves through the store buffer. Think of it like putting a divider on a grocery-store checkout conveyor belt that stops NT stores from getting grabbed early. It does not force the store buffer to be drained before it retires so putting LFENCE after it doesn't add up to MFENCE.
(AMD SFENCE更加强大,是一个完整的IIRC障碍,但是Intel/AMD/Via/etc.的最低行为是Intel所记录的.)
(AMD SFENCE is stronger, a full barrier IIRC, but the minimum behaviour across Intel/AMD/Via/etc. is what Intel documents.)
SFENCE + LFENCE不会阻止StoreLoad重新排序,因此不足以实现顺序一致性.只有
mfence
(或lock
ed操作或真正的序列化指令,如cpuid
)将执行此操作.请参阅Jeff Preshing的《法案》中涉及的内存重新排序只有一个完整的障碍就足够了.SFENCE + LFENCE doesn't block StoreLoad reordering, so it's not sufficient for sequential consistency. Only
mfence
(or alock
ed operation, or a real serializing instruction likecpuid
) will do that. See Jeff Preshing's Memory Reordering Caught in the Act for a case where only a full barrier is sufficient.来自英特尔公司针对
sfence
的指令集参考手册:From Intel's instruction-set reference manual entry for
sfence
:但是
LFENCE强制先前的指令本地完成". (即从内核的乱序部分退出),但是对于存储或SFENCE,这仅意味着将数据或标记放入内存顺序缓冲区中,而不是刷新它们,以便使存储在全局范围内可见.即 SFENCE完成" (从ROB退出)不包括刷新存储缓冲区.
这就像Preshing在内存障碍与来源中所描述的一样控制操作,其中StoreStore障碍不是即时"的.在那篇文章的后面,他解释了为什么#StoreStore + #LoadLoad + #LoadStore屏障不累加为#StoreLoad屏障. (x86 LFENCE对指令流进行了一些额外的序列化,但是由于它不刷新存储缓冲区,因此推理仍然成立.)
This is like Preshing describes in Memory Barriers Are Like Source Control Operations, where StoreStore barriers aren't "instant". Later in that that article, he explains why a #StoreStore + #LoadLoad + a #LoadStore barrier doesn't add up to a #StoreLoad barrier. (x86 LFENCE has some extra serialization of the instruction stream, but since it doesn't flush the store buffer the reasoning still holds).
LFENCE没有像
cpuid
那样完全序列化(一样强大的内存屏障.它只是LoadLoad + LoadStore的障碍,还有一些执行序列化的内容,它们可能是从实现细节开始的,但现在至少作为英特尔CPU的保证.rdtsc
很有用,可避免分支推测以减轻Spectre.LFENCE is not fully serializing like
cpuid
(which is as strong a memory barrier asmfence
or alock
ed instruction). It's just LoadLoad + LoadStore barrier, plus some execution serialization stuff which maybe started as an implementation detail but is now enshrined as a guarantee, at least on Intel CPUs. It's useful withrdtsc
, and for avoiding branch speculation to mitigate Spectre.顺便说一句,SFENCE是不可操作的,但NT商店除外;它针对正常(发布)商店订购它们.但与负载或LFENCE无关.仅在通常情况下顺序较弱的CPU上,存储-存储屏障才会执行任何操作.
BTW, SFENCE is a no-op except for NT stores; it orders them with respect to normal (release) stores. But not with respect to loads or LFENCE. Only on CPU that's normally weakly-ordered does a store-store barrier do anything.
真正关心的是商店和负载之间的StoreLoad重新排序,而不是商店和屏障之间的StoreLoad重新排序,因此您应该先看一下商店的案例,然后是屏障,然后是负载.
The real concern is StoreLoad reordering between a store and a load, not between a store and barriers, so you should look at a case with a store, then a barrier, then a load.
mov [var1], eax sfence lfence mov eax, [var2]
可以按以下顺序成为全局可见的(即提交到L1d缓存):
can become globally visible (i.e. commit to L1d cache) in this order:
lfence mov eax, [var2] ; load stays after LFENCE mov [var1], eax ; store becomes globally visible before SFENCE sfence ; can reorder with LFENCE
这篇关于SFENCE + LFENCE为什么等效于(或不等于)MFENCE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!