使用c ++ 11原子写一个（纺纱）线程障碍

本文介绍了使用c ++ 11原子写一个（纺纱）线程障碍的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！
问题描述

我试图熟悉自己的c ++ 11原子，所以我试着编写一个屏障类的线程（在有人抱怨不使用现有的类之前：这是更多的学习/自我提高比任何真正的需要）。我的类基本上如下：
    lock ）确保没有问题，如果线程仍然等待第一次使用的障碍（例如 
 
  edit2：
我现在已经使用gcc 4.5.1测试了它linux，这两个版本似乎都很好，这似乎指出了一个问题，mingw的 std :: atomic ，但我还是不完全相信， < atomic> 头更新了大多数函数只是调用适当的gcc原子意义，两个版本之间真的不应该有差别。
解决方案
它看起来不必要的复杂。尝试这个更简单的版本（好，我没有测试它，我只是冥想它:)）
  #include < atomic> 
 
 class spinning_barrier 
 {
 public：
 spinning_barrier（unsigned int n）：n_（n），nwait_（0），step_ 
 $ b bool wait（）
 {
 unsigned int step = step_.load（）; 
 
 if（nwait_.fetch_add（1）== n_  -  1）
 {
 / *好，最后一个线程来了。 * / 
 nwait_.store（0）; // XXX：也许可以使用轻松的顺序吗？ 
 step_.fetch_add（1）; 
 return true; 
} 
 else 
 {
 / *运行在圈子和尖叫像一个小女孩。 * / 
 while（step_.load（）== step）
; 
 return false; 
} 
} 
 
 protected：
 / *同步线程数。 * / 
 const unsigned int n_; 
 
 / *当前正在轮播的线程数。 * / 
 std :: atomic< unsigned int> nwait_; 
 
 / *到目前为止完成的屏障同步数，
 *它可以包装。 * / 
 std :: atomic< unsigned int>步_; 
}; 
  
 编辑： 
 @ Grizzy你的第一个（C + + 11）版本中的任何错误，我也运行它像一亿次同步与两个线程，它完成。我在双插槽/四核GNU / Linux机器上运行它，所以我倾向于怀疑你的选择3.  - 库（或者说，它的端口到win32）是不够成熟。 / p> 
I'm trying to familiarize myself with c++11 atomics, so I tried writing a barrier class for threads (before someone complains about not using existing classes: this is more for learning/self improvement than due to any real need). my class looks basically as followed:
class barrier
{
private:
    std::atomic<int> counter[2];
    std::atomic<int> lock[2];
    std::atomic<int> cur_idx;
    int thread_count;
public:
    //constructors...
    bool wait();
};
All members are initialized to zero, except thread_count, which holds the appropriate count.I have implemented the wait function as
int idx  = cur_idx.load();
if(lock[idx].load() == 0)
{
    lock[idx].store(1);
}
int val = counter[idx].fetch_add(1);
if(val >= thread_count - 1)
{
    counter[idx].store(0);
    cur_idx.fetch_xor(1);
    lock[idx].store(0);
    return true;
}
while(lock[idx].load() == 1);
return false;
However when trying to use it with two threads (thread_count is 2) whe first thread gets in the wait loop just fine, but the second thread doesn't unlock the barrier (it seems it doesn't even get to int val = counter[idx].fetch_add(1);, but I'm not too sure about that. However when I'm using gcc atomic-intrinsics by using volatile int instead of std::atomic<int> and writing wait as followed:
int idx = cur_idx;
if(lock[idx] == 0)
{
    __sync_val_compare_and_swap(&lock[idx], 0, 1);
}
int val = __sync_fetch_and_add(&counter[idx], 1);
if(val >= thread_count - 1)
{
    __sync_synchronize();
    counter[idx] = 0;
    cur_idx ^= 1;
    __sync_synchronize();
    lock[idx] = 0;
    __sync_synchronize();
    return true;
}
while(lock[idx] == 1);
return false;
it works just fine. From my understanding there shouldn't be any fundamental differences between the two versions (more to the point if anything the second should be less likely to work). So which of the following scenarios applies?
I got lucky with the second implementation and my algorithm is crap
I didn't fully understand std::atomic and there is a problem with the first variant (but not the second)
It should work, but the experimental implementation for c++11 libraries isn't as mature as I have hoped
For the record I'm using 32bit mingw with gcc 4.6.1
The calling code looks like this:
spin_barrier b(2);
std::thread t([&b]()->void
{
    std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
    b.wait();
});
b.wait();
t.join();
Since mingw doesn't whave <thread> headers jet I use a self written version for that which basically wraps the appropriate  pthread functions (before someone asks: yes it works without the barrier, so it shouldn't be a problem with the wrapping)Any insights would be appreciated.
edit: Explanation for the algorithm to make it clearer:
thread_count is the number of threads which shall wait for the barrier (so if thread_count threads are in the barrier all can leave the barrier).
lock is set to one when the first (or any) thread enters the barrier.
counter counts how many threads are inside the barrier and is atomically incremented once for each thread
if counter>=thread_count all threads are inside the barrier so counter and lock are reset to zero
otherwise the thread waits for the lock to become zero
in the next use of the barrier different variables (counter, lock) are used ensure there are no problems if threads are still waiting on the first use of the barrier (e.g. they had been preempted when the barrier is lifted)
edit2:I have now tested it using gcc 4.5.1 under linux, where both versions seem to work just fine, which seems to point to a problem with mingw's std::atomic, but I'm still not completely convinced, since looking into the <atomic> header revaled that most functions simply call the appropriate gcc-atomic meaning there really shouldn't bea difference between the two versions
 解决方案 
It looks needlessly complicated. Try this simpler version (well, I haven't tested it, I just meditated on it:))) :
#include <atomic>

class spinning_barrier
{
public:
    spinning_barrier (unsigned int n) : n_ (n), nwait_ (0), step_(0) {}

    bool wait ()
    {
        unsigned int step = step_.load ();

        if (nwait_.fetch_add (1) == n_ - 1)
        {
            /* OK, last thread to come.  */
            nwait_.store (0); // XXX: maybe can use relaxed ordering here ??
            step_.fetch_add (1);
            return true;
        }
        else
        {
            /* Run in circles and scream like a little girl.  */
            while (step_.load () == step)
                ;
            return false;
        }
    }

protected:
    /* Number of synchronized threads. */
    const unsigned int n_;

    /* Number of threads currently spinning.  */
    std::atomic<unsigned int> nwait_;

    /* Number of barrier syncronizations completed so far,
     * it's OK to wrap.  */
    std::atomic<unsigned int> step_;
};
EDIT:@Grizzy, I can't find any errors in your first (C++11) version and I've also run it for like a hundred million syncs with two threads and it completes. I've run it on a dual-socket/quad-core GNU/Linux  machine though, so I'm rather inclined to suspect your option 3. - the library (or rather, its port to win32) is not mature enough.
                        这篇关于使用c ++ 11原子写一个（纺纱）线程障碍的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！