本文介绍了将fetch_add(0,memory_order_relaxed/release)转换为mfence + mov是否合法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本文 N4455没有Sane编译器会优化Atomics 讨论了编译器可以应用于原子的各种优化.在优化周围对于seqlock示例,Atomics 提到了在LLVM中实现的转换,其中将 fetch_add(0,std :: memory_order_release)转换为 mfence 而不是通常的 lock add xadd .这个想法是,这避免了对缓存行的独占访问,并且相对便宜.不管为防止生成的 mov 指令对 StoreLoad 进行重新排序而提供的排序约束,仍然仍然需要 mfence .

转换是针对这样的 read-don-not-modify-write 执行的操作,而不考虑顺序,并且为 fetch_add(0,memory_order_relaxed)生成了等效的程序集.

但是,我想知道这是否合法.C ++标准在 [atomic.order] 下明确指出:

以前,安东尼·威廉姆斯 也提到了有关RMW操作看到最新"价值的事实.

>

我的问题是:基于原子变量的修改顺序,基于编译器是否发出 lock add,线程可以看到的值的行为是否存在差异mfence ,然后是普通负载?这种转换是否可能导致执行RMW操作的线程加载的值比最新的早?这是否违反了C ++内存模型的保证?

解决方案

(我前一阵子开始写,但是却陷入了停顿;我不确定它是否可以构成一个完整的答案,但我认为其中一些可能值得我认为@LWimsey的评论比我写的要好,成为答案的核心.)

是的,这很安全.

请记住,按条件规则的应用方式是在真实计算机上执行必须始终产生与C ++抽象计算机上可能执行的匹配的结果.优化是合法的,以使某些执行C ++抽象机无法在目标上执行.即使针对x86进行编译,也使得所有IRIW的重新排序都是不可能的,例如,无论编译器是否喜欢它.(请参见下文;某些PowerPC硬件是实践中唯一可以做到的主流硬件.)


我认为专门针对RMW的用词的原因是,它将负载与修改顺序"相关联.ISO C ++要求每个原子对象分别存在.(也许.)

请记住,C ++正式定义其排序模型的方式是与之同步,并且每个对象都存在修改顺序(所有线程都可以同意).类似于具有一致缓存的概念的硬件,它创建每个内核访问的内存的单一一致视图.一致的共享内存的存在(通常使用MESI始终保持一致)使一堆事情变得隐含,就像不可能读陈旧"的内容一样.价值观.(尽管硬件内存模型通常确实像C ++一样明确地记录了它.)

因此转换是安全的.

ISO C ++在另一部分的注释中确实提到了一致性的概念: http://eel.is/c++draft/intro.races#14

因此,ISO C ++本身注意到缓存一致性给出了一些顺序,而x86具有一致性缓存.(抱歉,我没有给出一个完整的论据, 是对的.抱歉,LWimsey关于修改顺序中最新的含义是相关的.)

(在许多ISA(但不是全部)上,内存模型还排除了,当您有两个不同的对象存储时(例如,在PowerPC上,两个读取器线程可能会不同意2个存储区到2个存储区的顺序 separate 对象).很少有实现可以创建这种重新排序的方法:如果共享缓存是唯一的 数据可以在内核之间获取的方式(例如在大多数CPU上),则会为存储创建顺序)

特别是在x86上,这很容易推理.x86具有严格排序的内存模型(TSO =总存储量订单=程序订单+具有存储转发功能的存储缓冲区.

脚注1: std :: thread 可以运行的所有内核都具有一致的缓存.在所有ISA上的所有实际C ++实现中都是正确的,而不仅仅是x86-64.有一些异构的板卡,它们具有单独的CPU,它们共享内存而没有缓存一致性,但是同一进程的普通C ++线程不会在这些不同的内核上运行.请参阅此答案以获取有关此内容的更多详细信息.

The paper N4455 No Sane Compiler Would Optimize Atomics talks about various optimizations compilers can apply to atomics. Under the section Optimization Around Atomics, for the seqlock example, it mentions a transformation implemented in LLVM, where a fetch_add(0, std::memory_order_release) is turned into a mfence followed by a plain load, rather than the usual lock add or xadd. The idea is that this avoids taking exclusive access of the cacheline, and is relatively cheaper. The mfence is still required regardless of the ordering constraint supplied to prevent StoreLoad reordering for the mov instruction generated.

This transformation is performed for such read-don't-modify-write operations regardless of the ordering, and equivalent assembly is produced for fetch_add(0, memory_order_relaxed).

However, I am wondering if this is legal. The C++ standard explicitly notes under [atomic.order] that:

This fact about RMW operations seeing the 'latest' value has also been noted previously by Anthony Williams.

My question is: Is there a difference of behavior in the value the thread could see based on the modification order of the atomic variable, based on whether the compiler emits a lock add vs mfence followed by a plain load? Is it possible for this transformation to cause the thread performing the RMW operation to instead load values older than the latest one? Does this violate the guarantees of the C++ memory model?

解决方案

(I started writing this a while ago but got stalled; I'm not sure it adds up to a full answer, but thought some of this might be worth posting. I think @LWimsey's comments do a better job of getting to the heart of an answer than what I wrote.)

Yes, it's safe.

Keep in mind that the way the as-if rule applies is that execution on the real machine has to always produce a result that matches one possible execution on the C++ abstract machine. It's legal for optimizations to make some executions that the C++ abstract machine allows impossible on the target. Even compiling for x86 at all makes all IRIW reordering impossible, for example, whether the compiler likes it or not. (See below; some PowerPC hardware is the only mainstream hardware that can do it in practice.)


I think the reason that wording is there for RMWs specifically is that it ties the load to the "modification order" which ISO C++ requires to exist for each atomic object separately. (Maybe.)

Remember that the way C++ formally defines its ordering model is in terms of synchronizes-with, and existence of a modification order for each object (that all threads can agree on). Not like hardware where there is a notion of coherent caches creating a single coherent view of memory that each core accesses. The existence of coherent shared memory (typically using MESI to maintain coherence at all times) makes a bunch of things implicit, like the impossibility of reading "stale" values. (Although HW memory models do typically document it explicitly like C++ does).

Thus the transformation is safe.

ISO C++ does mention the concept of coherency in a note in another section: http://eel.is/c++draft/intro.races#14

So ISO C++ itself notes that cache coherence gives some ordering, and x86 has coherent caches. (I'm not making a complete argument that this is enough ordering, sorry. LWimsey's comments about what it even means to be the latest in a modification order are relevant.)

(On many ISAs (but not all), the memory model also rules out IRIW reordering when you have stores to 2 separate objects. (e.g. on PowerPC, 2 reader threads can disagree about the order of 2 stores to 2 separate objects). Very few implementations can create such reordering: if shared cache is the only way data can get between cores, like on most CPUs, that creates an order for stores.)

On x86 specifically, it's very easy to reason about. x86 has a strongly-ordered memory model (TSO = Total Store Order = program order + a store buffer with store-forwarding).

Footnote 1: All cores that std::thread can run across have coherent caches. True on all real-world C++ implementations across all ISAs, not just x86-64. There are some heterogeneous boards with separate CPUs sharing memory without cache coherency, but ordinary C++ threads of the same process won't be running across those different cores. See this answer for more details about that.

这篇关于将fetch_add(0,memory_order_relaxed/release)转换为mfence + mov是否合法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 05:10