该问题基于:
When is it safe to destroy a pthread barrier?
以及最近的glibc错误报告:
http://sourceware.org/bugzilla/show_bug.cgi?id=12674
我不确定glibc中报告的信号量问题,但是按照上述链接的问题,据推测,只要pthread_barrier_wait
返回,就可以销毁屏障。 (通常,获得PTHREAD_BARRIER_SERIAL_THREAD
的线程,或者已经认为自己对屏障对象“负责”的“特殊”线程,将是破坏它的对象。)我能想到的主要用例是使用屏障时在创建线程的堆栈上同步新线程对数据的使用,以防止创建线程在新线程开始使用数据之前返回;其他障碍的生存期可能与整个程序相同,或者由其他一些同步对象控制。
无论如何,怎样实现,以确保pthread_barrier_wait
在任何线程中返回后,销毁屏障(甚至可能取消对它所驻留的内存的映射)是安全的?似乎尚未返回的其他线程将需要检查至少一部分障碍对象以完成其工作并返回,就像在上面引用的glibc错误报告中,sem_post
如何检查具有调整信号量值。
最佳答案
我将通过使用mutt和条件变量功能的pthread_barrier_wait()
示例实现来解决这个问题,该功能可能由pthreads实现提供。请注意,此示例并未尝试处理性能方面的考虑(特别是,当等待线程解除阻塞时,退出等待时它们都将重新序列化)。我认为使用Linux Futex对象之类的东西可以解决性能问题,但是Futexes仍然远远超出我的经验。
另外,我怀疑此示例是否正确处理了信号或错误(如果在信号情况下根本无法正确处理)。但是我认为可以为读者增加对这些内容的适当支持。
我的主要担心是该示例可能会出现竞争状况或死锁(互斥锁处理比我喜欢的还要复杂)。还要注意,这是一个甚至尚未编译的示例。将其视为伪代码。另外请记住,我的经验主要是在Windows中-我将这作为一种教育机会来加以解决,而不是其他任何事情。因此,伪代码的质量可能很低。
但是,撇开免责声明,我认为这可能会带来一个想法,即如何解决问题中提出的问题(即pthread_barrier_wait()
函数如何允许它使用的pthread_barrier_t
对象被释放的任何线程破坏,而没有任何危险)通过一个或多个线程使用屏障对象)。
开始:
/*
* Since this is a part of the implementation of the pthread API, it uses
* reserved names that start with "__" for internal structures and functions
*
* Functions such as __mutex_lock() and __cond_wait() perform the same function
* as the corresponding pthread API.
*/
// struct __barrier_wait data is intended to hold all the data
// that `pthread_barrier_wait()` will need after releasing
// waiting threads. This will allow the function to avoid
// touching the passed in pthread_barrier_t object after
// the wait is satisfied (since any of the released threads
// can destroy it)
struct __barrier_waitdata {
struct __mutex cond_mutex;
struct __cond cond;
unsigned waiter_count;
int wait_complete;
};
struct __barrier {
unsigned count;
struct __mutex waitdata_mutex;
struct __barrier_waitdata* pwaitdata;
};
typedef struct __barrier pthread_barrier_t;
int __barrier_waitdata_init( struct __barrier_waitdata* pwaitdata)
{
waitdata.waiter_count = 0;
waitdata.wait_complete = 0;
rc = __mutex_init( &waitdata.cond_mutex, NULL);
if (!rc) {
return rc;
}
rc = __cond_init( &waitdata.cond, NULL);
if (!rc) {
__mutex_destroy( &pwaitdata->waitdata_mutex);
return rc;
}
return 0;
}
int pthread_barrier_init(pthread_barrier_t *barrier, const pthread_barrierattr_t *attr, unsigned int count)
{
int rc;
result = __mutex_init( &barrier->waitdata_mutex, NULL);
if (!rc) return result;
barrier->pwaitdata = NULL;
barrier->count = count;
//TODO: deal with attr
}
int pthread_barrier_wait(pthread_barrier_t *barrier)
{
int rc;
struct __barrier_waitdata* pwaitdata;
unsigned target_count;
// potential waitdata block (only one thread's will actually be used)
struct __barrier_waitdata waitdata;
// nothing to do if we only need to wait for one thread...
if (barrier->count == 1) return PTHREAD_BARRIER_SERIAL_THREAD;
rc = __mutex_lock( &barrier->waitdata_mutex);
if (!rc) return rc;
if (!barrier->pwaitdata) {
// no other thread has claimed the waitdata block yet -
// we'll use this thread's
rc = __barrier_waitdata_init( &waitdata);
if (!rc) {
__mutex_unlock( &barrier->waitdata_mutex);
return rc;
}
barrier->pwaitdata = &waitdata;
}
pwaitdata = barrier->pwaitdata;
target_count = barrier->count;
// all data necessary for handling the return from a wait is pointed to
// by `pwaitdata`, and `pwaitdata` points to a block of data on the stack of
// one of the waiting threads. We have to make sure that the thread that owns
// that block waits until all others have finished with the information
// pointed to by `pwaitdata` before it returns. However, after the 'big' wait
// is completed, the `pthread_barrier_t` object that's passed into this
// function isn't used. The last operation done to `*barrier` is to set
// `barrier->pwaitdata = NULL` to satisfy the requirement that this function
// leaves `*barrier` in a state as if `pthread_barrier_init()` had been called - and
// that operation is done by the thread that signals the wait condition
// completion before the completion is signaled.
// note: we're still holding `barrier->waitdata_mutex`;
rc = __mutex_lock( &pwaitdata->cond_mutex);
pwaitdata->waiter_count += 1;
if (pwaitdata->waiter_count < target_count) {
// need to wait for other threads
__mutex_unlock( &barrier->waitdata_mutex);
do {
// TODO: handle the return code from `__cond_wait()` to break out of this
// if a signal makes that necessary
__cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex);
} while (!pwaitdata->wait_complete);
}
else {
// this thread satisfies the wait - unblock all the other waiters
pwaitdata->wait_complete = 1;
// 'release' our use of the passed in pthread_barrier_t object
barrier->pwaitdata = NULL;
// unlock the barrier's waitdata_mutex - the barrier is
// ready for use by another set of threads
__mutex_unlock( barrier->waitdata_mutex);
// finally, unblock the waiting threads
__cond_broadcast( &pwaitdata->cond);
}
// at this point, barrier->waitdata_mutex is unlocked, the
// barrier->pwaitdata pointer has been cleared, and no further
// use of `*barrier` is permitted...
// however, each thread still has a valid `pwaitdata` pointer - the
// thread that owns that block needs to wait until all others have
// dropped the pwaitdata->waiter_count
// also, at this point the `pwaitdata->cond_mutex` is locked, so
// we're in a critical section
rc = 0;
pwaitdata->waiter_count--;
if (pwaitdata == &waitdata) {
// this thread owns the waitdata block - it needs to hang around until
// all other threads are done
// as a convenience, this thread will be the one that returns
// PTHREAD_BARRIER_SERIAL_THREAD
rc = PTHREAD_BARRIER_SERIAL_THREAD;
while (pwaitdata->waiter_count!= 0) {
__cond_wait( &pwaitdata->cond, &pwaitdata->cond_mutex);
};
__mutex_unlock( &pwaitdata->cond_mutex);
__cond_destroy( &pwaitdata->cond);
__mutex_destroy( &pwaitdata_cond_mutex);
}
else if (pwaitdata->waiter_count == 0) {
__cond_signal( &pwaitdata->cond);
__mutex_unlock( &pwaitdata->cond_mutex);
}
return rc;
}
,2011年7月17日1:更新,以回应有关流程共享障碍的评论/问题
我完全忘记了进程之间共享的障碍的情况。正如您提到的那样,在这种情况下,我概述的想法将严重失败。我真的没有使用POSIX共享内存的经验,因此,我提出的任何建议都应持怀疑态度。
总结(为了我的利益,如果没有人的话):
当
pthread_barrier_wait()
返回后任何线程获得控制时,屏障对象需要处于'init'状态(但是,该对象上的最新pthread_barrier_init()
对其进行了设置)。该API还暗示,一旦任何线程返回,就会发生以下一种或多种情况:pthread_barrier_wait()
的调用开始新一轮的障碍物对象上的
pthread_barrier_destroy()
这些事情意味着,在
pthread_barrier_wait()
调用允许任何线程返回之前,几乎需要确保所有等待线程在该调用的上下文中不再使用barrier对象。我的第一个答案通过在屏障对象之外创建一个“本地”同步对象集(互斥体和关联的条件变量)来解决此问题,该对象将阻塞所有线程。这些本地同步对象分配在碰巧首先调用pthread_barrier_wait()
的线程的堆栈上。我认为,对于流程共享的障碍,需要采取类似的措施。但是,在那种情况下,仅在线程堆栈上分配这些同步对象是不够的(因为其他进程将无权访问)。对于进程共享的障碍,必须在进程共享的内存中分配这些对象。我认为我上面列出的技术可以类似地应用:
waitdata_mutex
将已经位于进程共享的内存中。当然,当障碍设置为THEAD_PROCESS_SHARED
时,该属性也需要应用于waitdata_mutex
__barrier_waitdata_init()
初始化本地互斥量和条件变量时,它将不得不在共享内存中分配这些对象,而不是简单地使用基于堆栈的waitdata
变量。 waitdata
块中的互斥量和条件变量时,它还需要清理该块的进程共享内存分配。 我认为这些更改将使该计划能够在过程共享的障碍下运行。上面的最后一个要点是要弄清的关键项。另一个是如何为共享内存对象构造一个名称,该名称将保存“本地”进程共享的
waitdata
。您需要该名称具有某些属性:struct pthread_barrier_t
结构中,以便所有进程都可以访问它;这意味着名称pthread_barrier_wait()
的一组调用中的每个“实例”都是唯一的,因为在所有线程从第一轮等待中完全退出之前,可能有第二轮等待开始(因此为waitdata
设置的进程共享内存块可能尚未释放)。因此,该名称可能必须基于诸如进程ID,线程ID,屏障对象的地址和原子计数器之类的东西。 关于c - 一旦pthread_barrier_wait返回,屏障如何被破坏?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/5886614/