本文介绍了在 x86/x86_64 处理器上使用 LFENCE 指令有意义吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常在互联网上,我发现 LFENCE 在 x86 处理器中毫无意义,即它什么也不做,因此我们可以完全轻松地使用 SFENCE,因为 MFENCE = SFENCE + LFENCE = SFENCE + NOP =SFENCE.

但是如果 LFENCE 没有意义,那么为什么我们有四种方法可以在 x86/x86_64 中实现顺序一致性:

  1. LOAD(无围栏)和 STORE + MFENCE
  2. LOAD(无围栏)和 LOCK XCHG
  3. MFENCE + LOADSTORE(无围栏)
  4. LOCK XADD ( 0 ) 和 STORE (无围栏)

取自:http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

以及底部第 34 页 Herb Sutter 的表演:https://skydrive.live.com/view.aspx?resid=4E86B0CF20EF15AD!24884&app=WordPdf&wdo=2&authkey=!AMtj_EflYn2507c

如果LFENCE没有做任何事情,那么方法(3)的含义如下:SFENCE + LOAD and STORE(无围栏),但是没有点在LOAD之前做SFENCE.即如果 LFENCE 什么都不做,方法 (3) 就没有意义.

在 x86/x86_64 处理器中指令 LFENCE 是否有意义?

答案:

1. LFENCE 在下面接受的答案中描述的情况下是必需的.

2. 方法 (3) 不应单独查看,而应与前面的命令结合查看.例如方法(3):

MFENCEMOV reg, [addr1]//LOAD-1MOV [addr2], reg//STORE-1MFENCEMOV reg, [addr1]//LOAD-2MOV [addr2], reg//STORE-2

我们可以将方法(3)的代码改写如下:

SFENCEMOV reg, [addr1]//LOAD-1MOV [addr2], reg//STORE-1围栏MOV reg, [addr1]//LOAD-2MOV [addr2], reg//STORE-2

这里 SFENCE 可以防止重新排序 STORE-1 和 LOAD-2.为此,在 STORE-1 命令 SFENCE 刷新 Store-Buffer 之后.

解决方案

Bottom line (TL;DR): LFENCE 单独用于内存排序确实似乎没用,但它不会使 SFENCE 代替 MFENCE.题中的算术"逻辑不适用.

这里摘录自 Intel 的软件开发人员手册,第 3 卷,第 8.2.2 节(2014 年 9 月版本 325384-052US),与我相同在另一个答案中使用

  • 读取不会与其他读取重新排序.
  • 写入不会与较旧的读取重新排序.
  • 写入内存不会与其他写入重新排序,但以下情况除外:
    • 使用 CLFLUSH 指令执行写入;
    • 使用非临时移动指令(MOVNTI、MOVNTQ、MOVNTDQ、MOVNTPS 和 MOVNTPD)执行的流式存储(写入);和
    • 字符串操作(参见第 8.2.4.1 节).
  • 读取可能会随着较旧的写入不同位置而重新排序,但不会与较旧的写入相同位置.
  • 无法使用 I/O 指令、锁定指令或序列化指令对读取或写入进行重新排序.
  • 读取不能通过更早的 LFENCE 和 MFENCE 指令.
  • 写入不能通过更早的 LFENCE、SFENCE 和 MFENCE 指令.
  • LFENCE 指令无法通过较早的读取.
  • SFENCE 指令无法通过较早的写入.
  • MFENCE 指令无法通过更早的读取或写入.

从这里开始:

  • MFENCE 是一个完整的内存栅栏,适用于所有内存类型的所有操作,无论是否是非临时的.
  • SFENCE 仅防止写入的重新排序(在其他术语中,它是 StoreStore 屏障),并且仅与非临时存储和列为例外的其他指令一起使用.
  • LFENCE 防止读取与后续读取和写入的重新排序(即它结合了 LoadLoad 和 LoadStore 屏障).但是,前两个项目符号表示 LoadLoad 和 LoadStore 屏障始终存在,没有例外.因此 LFENCE 单独用于内存排序是无用的.

为了支持最后一个声明,我查看了英特尔手册的所有 3 卷中提到 LFENCE 的所有地方,但没有发现任何地方会说需要 LFENCE为了内存一致性.甚至 MOVNTDQA - 迄今为止唯一的非临时加载指令 - 提到了 MFENCE 但没有提到 LFENCE.

更新:查看关于 为什么是(或者不是?)SFENCE + LFENCE 等价于 MFENCE? 正确回答下面的猜测

MFENCE 是否等价于其他两个围栏的和"是一个棘手的问题.乍一看,在三个栅栏指令中,只有 MFENCE 提供了 StoreLoad 屏障,即防止读取与早期写入的重新排序.然而,正确答案需要知道的不仅仅是上述规则;也就是说,重要的是所有围栏指令都是相互排序的.这使得 SFENCE LFENCE 序列比单纯的单个效果联合更强大:这个序列还可以防止 StoreLoad 重新排序(因为加载不能通过 LFENCE,后者不能通过 SFENCE,不能通过stores),从而构成一个完整的内存栅栏(但也见下面的注释(*)).但是请注意,这里的顺序很重要,LFENCE SFENCE 序列没有相同的协同效应.

然而,虽然可以说MFENCE ~ SFENCE LFENCELFENCE ~ NOP,但这并不意味着MFENCE ~ SFENCE.我故意使用等价(~)而不是等价(=)来强调算术规则在这里不适用.SFENCE 后跟 LFENCE 的相互影响才有所不同;即使加载没有相互重新排序,也需要 LFENCE 以防止使用 SFENCE 重新排序加载.

(*) 说 MFENCE 比其他两个围栏的组合更强大仍然可能是正确的.特别是,英特尔手册第 2 卷中对 CLFLUSH 指令的注释说CLFLUSH 仅由 MFENCE 指令排序.它是不保证被任何其他围栏或序列化指令或其他 CLFLUSH 指令排序."

(更新,clflush 现在被定义为强有序的(就像一个普通的商店,所以如果你想阻止以后的加载,你只需要 mfenceem>),但是 clflushopt 是弱排序的,但是可以被 sfence 围起来.)

Often in internet I find that LFENCE makes no sense in processors x86, ie it does nothing , so instead MFENCE we can absolutely painless to use SFENCE, because MFENCE = SFENCE + LFENCE = SFENCE + NOP = SFENCE.

But if LFENCE does not make sense, then why we have four approaches to make Sequential Consistency in x86/x86_64:

  1. LOAD (without fence) and STORE + MFENCE
  2. LOAD (without fence) and LOCK XCHG
  3. MFENCE + LOAD and STORE (without fence)
  4. LOCK XADD ( 0 ) and STORE (without fence)

Taken from here: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

As well as performances from Herb Sutter on page 34 at the bottom: https://skydrive.live.com/view.aspx?resid=4E86B0CF20EF15AD!24884&app=WordPdf&wdo=2&authkey=!AMtj_EflYn2507c

If LFENCE did not do anything, then the approach (3) would have the following meanings: SFENCE + LOAD and STORE (without fence), but there is no point in doing SFENCE before LOAD. Ie if LFENCE does nothing , the approach (3) does not make sense.

Does it make any sense instruction LFENCE in processors x86/x86_64?

ANSWER:

1. LFENCE required in cases which described in the accepted answer, below.

2. The approach (3) should be viewed not independently, but in combination with the previous commands. For example, approach (3):

MFENCE
MOV reg, [addr1]  // LOAD-1
MOV [addr2], reg  //STORE-1

MFENCE
MOV reg, [addr1]  // LOAD-2
MOV [addr2], reg  //STORE-2

We can rewrite the code of approach (3) as follows:

SFENCE
MOV reg, [addr1]  // LOAD-1
MOV [addr2], reg  //STORE-1

SFENCE
MOV reg, [addr1]  // LOAD-2
MOV [addr2], reg  //STORE-2

And here SFENCE makes sense to prevent reordering STORE-1 and LOAD-2. For this after STORE-1 command SFENCE flushes Store-Buffer.

解决方案

Bottom line (TL;DR): LFENCE alone indeed seems useless for memory ordering, however it does not make SFENCE a substitute for MFENCE. The "arithmetic" logic in the question is not applicable.


Here is an excerpt from Intel's Software Developers Manual, volume 3, section 8.2.2 (the edition 325384-052US of September 2014), the same that I used in another answer

From here, it follows that:

  • MFENCE is a full memory fence for all operations on all memory types, whether non-temporal or not.
  • SFENCE only prevents reordering of writes (in other terminology, it's a StoreStore barrier), and is only useful together with non-temporal stores and other instructions listed as exceptions.
  • LFENCE prevents reordering of reads with subsequent reads and writes (i.e. it combines LoadLoad and LoadStore barriers). However, the first two bullets say that LoadLoad and LoadStore barriers are always in place, no exceptions. Therefore LFENCE alone is useless for memory ordering.

To support the last claim, I looked at all places where LFENCE is mentioned in all 3 volumes of Intel's manual, and found none which would say that LFENCE is required for memory consistency. Even MOVNTDQA - the only non-temporal load instruction so far - mentions MFENCE but not LFENCE.


Update: see answers on Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE? for correct answers to the guesswork below

Whether MFENCE is equivalent to a "sum" of other two fences or not is a tricky question. At glance, among the three fence instructions only MFENCE provides StoreLoad barrier, i.e. prevents reordering of reads with earlier writes. However the correct answer requires to know more than the above rules; namely, it's important that all fence instructions are ordered with respect to each other. This makes the SFENCE LFENCE sequence more powerful than a mere union of individual effects: this sequence also prevents StoreLoad reordering (because loads cannot pass LFENCE, which cannot pass SFENCE, which cannot pass stores), and thus constitutes a full memory fence (but also see the note (*) below). Note however that order matters here, and the LFENCE SFENCE sequence does not have the same synergy effect.

However, while one can say that MFENCE ~ SFENCE LFENCE and LFENCE ~ NOP, that does not mean MFENCE ~ SFENCE. I deliberately use equivalence (~) and not equality (=) to stress that arithmetic rules do not apply here. The mutual effect of SFENCE followed by LFENCE makes the difference; even though loads are not reordered with each other, LFENCE is required to prevent reordering of loads with SFENCE.

(*) It still might be correct to say that MFENCE is stronger than the combination of the other two fences. In particular, a note to CLFLUSH instruction in the volume 2 of Intel's manual says that "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to be ordered by any other fencing or serializing instructions or by another CLFLUSH instruction."

(Update, clflush is now defined as strongly ordered (like a normal store, so you only need mfence if you want to block later loads), but clflushopt is weakly ordered, but can be fenced by sfence.)

这篇关于在 x86/x86_64 处理器上使用 LFENCE 指令有意义吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 21:10