问题描述
我与MS C编译器重新排序某些陈述,在多线程环境至关重要,高水平优化的问题。我想知道如何同时仍然使用高含量的优化来强制特定的地方排序。 (在优化的较低水平,该编译器不会重新排序语句)
I have a problem with the MS C compiler reordering certain statements, critical in a multithreading context, at high levels of optimization. I want to know how to force ordering in specific places while still using high levels of optimization. (At low levels of optimization, this compiler does not reorder statements)
以下code:
ChunkT* plog2sizeChunk=...
SET_BUSY(plog2sizeChunk->pPoolAndBusyFlag); // set "busy" bit on this chunk of storage
x = plog2sizeChunk->pNext;
此产生:
0040130F 8B 5A 08 mov ebx,dword ptr [edx+8]
00401312 83 22 FE and dword ptr [edx],0FFFFFFFEh
在其中写入pPoolAndBusyFlag是由编译器重新排序发生的之后的的pNext获取。
SET_BUSY基本上
SET_BUSY is essentially
plog2sizeChunk->pPoolAndBusyFlag&=0xFFFFFFFeh;
我觉得编译器已经理所当然地觉得是可以的,因为它们是在同一结构的两个独立的成员重新排序这些访问,这样的重新排序,对单线程执行的结果没有影响:
I think the compiler has rightfully decided it was OK to reorder these accesses because they are to two separate members of the same struct, and such reordering has no affect on the results of single-threaded execution:
typedef struct chunk_tag{
unsigned pPoolAndBusyFlag; // Contains pointer to owning pool and a busy flag
natural log2size; // holds log2size of the chunk if Busy==false
struct chunk_tag* pNext; // holds pointer to next block of same size
struct chunk_tag* pPrev; // holds pointer to previous block of same size
} ChunkT, *pChunkT;
对于我而言,pPoolAndBusyFlag必须设置先于其他访问这个结构是在多线程/多核心下有效。我不认为的这个的
特定的访问对我来说是有问题的,但事实上编译器可以重新排列此
意味着我的code的其他部分可具有相同种类再排序的,但它可
在那些地方的关键。 (想象一下,两种说法都更新这两个
件,而不是一个写/一个读)。我希望能够力动作的顺序。
For my purposes, the pPoolAndBusyFlag has to be set before other accesses to this structure are valid in a multithreaded/multicore context. I don't think thisparticular access is problematic for me, but the fact the compiler can reorder thismeans that other parts of my code may have the same kind of reordering but it maybe critical in those places. (Imagine the two statements are updates to the twomembers rather than one write/one read). I want to be able force the order of the actions.
在理想情况下,我会写这样的:
Ideally, I'd write something like:
plog2sizeChunk->pPoolAndBusyFlag&=0xFFFFFFFeh;
#pragma no-reordering // no such directive appears to exist
pNext = plog2sizeChunk->pNext;
我已经实验验证我能得到这个丑陋的方式这样的效果:
I have experimentally verified I can get this effect in this ugly way:
plog2sizeChunk->pPoolAndBusyFlag&=0xFFFFFFFeh;
asm { xor eax, eax } // compiler won't optimize past asm block
pNext = plog2sizeChunk->pNext;
给
0040130F 83 22 FE and dword ptr [edx],0FFFFFFFEh
00401312 33 C0 xor eax,eax
00401314 8B 5A 08 mov ebx,dword ptr [edx+8]
我注意到,在x86硬件可能重新排序反正这些特殊的指令,因为它们不引用相同的内存位置,并且可以读取写入传递;要真正修复的这个的例子,我需要某种类型的内存屏障。回到我刚才的言论,如果他们都是写道,在x86不会重新排序,并写入订单将被其他线程的顺序可以看出。因此,在这种情况下,我不认为我需要一个内存屏障,只是一个强制排序。
I note that the x86 hardware may reorder these particular instructions anyway since they don't refer to the same memory location, and reads may pass writes; to really fix this example, I'd need some type of memory barrier. Back to my earlier remark, if they were both writes, the x86 will not reorder them, and the write order will be seen in that order by other threads. So in that case I don't think I need a memory barrier, just a forced ordering.
我还没有看到编译器重新排序两个写(还),但我一直没找很硬(还);我只是绊倒了这一点。并与优化过程中,只是因为你没有看到它在此编译并不意味着它不会出现在未来的。
I have not seen the compiler re-order two writes (yet) but I haven't been looking very hard (yet); I just tripped over this. And of course with optimizations just because you don't see it in this compilation doesn't mean it won't appear in the next.
所以,我怎么强制编译器订购这些?
So, how do I force the compiler to order these?
我明白,我可以宣布在结构中内存插槽挥发。他们还在的独立的存储位置,所以我不认为这$ P $如何pvents的优化。也许我是错间preting挥发性什么手段?
I understand I can declare the memory slots in the struct to be volatile. They are still independent storage locations, so I don't see how this prevents an optimization. Maybe I'm mis-interpreting what volatile means?
编辑(10月20日):感谢所有的反应。我目前的实现使用挥发性(作为初始解),_ReadWriteBarrier(以纪念code其中不应该由编译器进行重新排序),和几个MemoryBarriers(其中读写操作),而且似乎有解决了这个问题。
EDIT (Oct 20): Thanks to all the responders. My current implementation uses volatile (used as the initial solution), _ReadWriteBarrier (to mark the code where reordering shouldn't occur by the compiler), and a few MemoryBarriers (where reads and writes occur), and that seems to have solved the problem.
编辑:(11月2日):是干净的,我最终确定台宏ReadBarrier,WriteBarrier和ReadWriteBarrier的。有套pre和后锁定,pre和解锁后,和一般使用。其中有些是空的,一些含有_ReadWriteBarrier和内存屏障,以适合根据XCHG x86和典型自旋锁[XCHG包括这样就避免在锁定$ P $对 - /后套),该需要的隐式内存屏障。然后我在code。在合适的停放记录这些基本的(非)重新排序的要求。
(Nov 2): To be clean, I ended up defining sets of macros for ReadBarrier, WriteBarrier, and ReadWriteBarrier. There are sets for pre and post locking, pre and post unlocking, and general usage. Some of these are empty, some contain _ReadWriteBarrier and MemoryBarrier, as appropriate for the x86 and typical spin locks based on XCHG [XCHG includes an implicit MemoryBarrier thus obviating that need in lock pre-/post- sets). I then parked these in the code at appropriate documenting the essential (non)reordering requirements.
推荐答案
所以我的理解是在 pNext = plog2sizeChunk-> pNext
公布块使之能其他线程可以看到,你必须确保他们看到正确的忙标志。
So as I understand it the pNext = plog2sizeChunk->pNext
publishes the block so that it can be seen by other threads and you have to make sure they see the correct busy flag.
这意味着你需要一个单向内存屏障的前的出版它(还指另一个线程读取它之前,但如果你的code在x86上运行,你获得这些免费)到确保线程实际看到的变化。您还需要一个写入前,以避免之后重新排序写道。不只是插入组装或使用符合标准的挥发性(MSVC挥发性提供额外的保证,虽然使这里差)的不可以足够了 - 是的,这将停止编译器从移动读取和周围写,但CPU是不受它的约束,可以在内部做同样的重新排序。
That means you need a uni-directional memory barrier before publishing it (also one before reading it in another thread, although if your code runs on x86 you get those for free) to make sure that threads actually see the change. You also need one before the write to avoid reordering writes after it. No just inserting assembly or using a standard compliant volatile (MSVC volatile gives extra guarantees though that make a difference here) is not enough - yes this stops the compiler from shifting reads and writes around, but the CPU is not bound by it and can do the same reordering internally.
这两个MSVC和gcc有内部函数/宏来创建内存屏障(的)。 MSVC还给出了更强的保障,这是你的问题不够好挥发物。最后,C ++ 11原子公司将工作为好,但我不知道如果C本身有任何可移植的方法,以保证内存屏障。
Both MSVC and gcc have intrinsics/macros to create memory barriers (see eg here). MSVC also gives stronger guarantees to volatiles that are good enough for your problem. Finally C++11 atomics would work as well, but I'm not sure if C itself has any portable way to guarantee memory barriers.
这篇关于C语句的执行力顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!