问题描述
我从未使用过的OpenMP指令,并且不知道何时使用它是flush
(带有和不带有列表).
One OpenMP directive I have never used and don't know when to use is flush
(with and without a list).
我有两个问题:
1.) When is an explicit `omp flush` or `omp flush(var1, ...) necessary?
2.) Is it sometimes not necessary but helpful (i.e. can it make the code fast)?
我不明白何时使用显式刷新的主要原因是 在同步线程的许多指令(例如,barrier,single,...)之后隐式完成.例如,我看不到使用冲洗和不同步(例如,使用nowait
)的方式会有所帮助.
The main reason I can't understand when to use an explicit flush is that flushes are done implicitly after many directives (e.g. as barrier, single, ...) which synchronize the threads. I can't, for example, see way using flush and not synchronizing (e.g. with nowait
) would be helpful.
我知道不同的编译器可能以不同的方式实现omp flush
.有些人可能会将带有列表的刷新解释为不带列表的刷新(即刷新所有共享对象) OpenMP flush vs flush(列表).但是我只关心规范的要求.换句话说,我想知道原则上明确的flush
在哪里是必要的或有用的.
I understand that different compilers may implement omp flush
in different ways. Some may interpret a flush with a list as as one without (i.e. flush all shared objects) OpenMP flush vs flush(list). But I only care about what the specification requires. In other words, I want to know where an explicit flush
in principle may be necessary or helpful.
我想我需要澄清第二个问题.让我举个例子吧.我想知道是否存在删除隐式刷新(例如使用nowait)并改为使用显式刷新但仅在某些共享变量上更快(并且仍然给出正确结果)的情况.类似于以下内容:
I think I need to clarify my second question. Let me give an example. I would like to know if there are cases where removing an implicit flush (e.g. with nowait) and instead using an explicit flush instead but only on certain shared variables would be faster (and still give the correct result). Something like the following:
float a,b;
#pragma omp parallel
{
#pragma omp for nowait // No barrier. Do not flush on exit.
//code which uses only shared variable a
#pragma omp flush(a) // Flush only variable a rather than all shared variables.
#pragma omp for
//Code which uses both shared variables a and b.
}
我认为代码在第一个for循环之后仍然需要一个障碍,但是所有障碍都具有隐式刷新,因此无法达到目的.可以设置不冲洗的障碍物吗?
I think that code still needs a barrier after the the first for loop but all barriers have an implicit flush so that defeats the purpose. Is it possible to have a barrier which does not do a flush?
推荐答案
flush指令告诉OpenMP编译器生成代码,以使线程在共享内存上的私有视图再次保持一致. OpenMP通常可以很好地处理此问题,并且可以对典型程序执行正确的操作.因此,不需要flush
.
The flush directive tells the OpenMP compiler to generate code to make the thread's private view on the shared memory consistent again. OpenMP usually handles this pretty well and does the right thing for typical programs. Hence, there's no need for flush
.
但是,在某些情况下,OpenMP编译器需要一些帮助.这些情况之一是当您尝试实现自己的自旋锁时.在这些情况下,您需要组合刷新才能使工作正常进行,因为否则自旋变量将不会更新.正确地确定冲洗顺序将很困难,而且非常容易出错.
However, there are cases where the OpenMP compiler needs some help. One of these cases is when you try to implement your own spin lock. In these cases, you would need a combination of flushes to make things work, since otherwise the spin variables will not be updated. Getting the sequence of flushes correct will be tough and will be very, very error prone.
一般建议不要使用冲洗.如果有的话,程序员应该避免使用列表(flush(var,...)
)进行刷新.实际上,有些人正在谈论在将来的OpenMP中弃用它.
The general recommendation is that flushes should not be used. If at all, programmers should avoid flush with a list (flush(var,...)
) at all means. Some folks are actually talking about deprecating it in future OpenMP.
性能方面,冲洗的负面影响应大于正面影响.由于它会导致编译器生成内存屏障和其他加载/存储操作,因此我希望它会减慢速度.
Performance-wise the impact of flush should be more negative than positive. Since it causes the compiler to generate memory fences and additional load/store operations, I would expect it to slow down things.
对于第二个问题,答案是否定的. OpenMP确保每个线程在需要时在共享内存上具有一致的视图.如果线程不同步,则它们不需要在共享内存上更新视图,因为它们在那里看不到任何有趣的"变化.这意味着线程进行的任何读取都不会读取任何其他线程已更改的数据.如果真是这样,那么您将在程序中出现竞争状况和潜在的错误.为了避免竞争,您需要进行同步(这意味着进行刷新以使每个参与线程的视图再次保持一致).类似的论点也适用于壁垒.您可以使用障碍在并行区域的计算中开始新的纪元.由于您将线程保持在同步状态,因此很有可能在上一个时期计算出的线程之间也有一些共享状态.
For your second question, the answer is no. OpenMP makes sure that each thread has a consistent view on the shared memory when it needs to. If threads do not synchronize, they do not need to update their view on the shared memory, because they do not see any "interesting" change there. That means that any read a thread makes does not read any data that has been changed by some other thread. If that would be the case, then you'd have a race condition and a potential bug in your program. To avoid the race, you need to synchronize (which then implies a flush to make each participating thread's view consistent again). A similar argument applies to barriers. You use barriers to start a new epoch in the computation of a parallel region. Since you're keeping the threads in lock-step, you will very likely also have some shared state between the threads that has been computed in the previous epoch.
顺便说一句,OpenMP 可能保留线程的私有数据,但没有.因此,OpenMP编译器可能会将变量在寄存器中保留一段时间,这将导致它们与共享内存不同步.但是,对数组元素的更新通常会很快反映在共享内存中,因为线程的私有存储量通常很小(寄存器集,缓存,暂存器等). OpenMP仅对您所期望的内容提供了一些微弱的限制.实际的OpenMP实现(或硬件)可能会像其希望的那样严格(例如,立即写回所有更改并一直刷新).
BTW, OpenMP may keep private data for a thread, but it does not have to. So, it is likely that the OpenMP compiler will keep variables in registers for a while, which causes them to be out of sync with the shared memory. However, updates to array elements are typically reflected pretty soon in the shared memory, since the amount of private storage for a thread is usually small (register sets, caches, scratch memory, etc.). OpenMP only gives you some weak restrictions on what you can expect. An actual OpenMP implementation (or the hardware) may be as strict as it wishes to be (e.g., write back any change immediately and to flushes all the time).
这篇关于与OpenMP一起使用flushic伪指令:什么时候有必要,什么时候有帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!