与多处理事件和队列不兼容吗

与多处理事件和队列不兼容吗

本文介绍了与多处理事件和队列不兼容吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试模拟使用扭曲运行的应用程序网络.作为仿真的一部分,我希望同步某些事件,并能够为每个进程提供大量数据.我决定使用多处理事件和队列.但是,我的进程变得异常混乱.

I am trying to simulate a network of applications that run using twisted. As part of my simulation I would like to synchronize certain events and be able to feed each process large amounts of data. I decided to use multiprocessing Events and Queues. However, my processes are getting hung.

我写了下面的示例代码来说明问题.具体来说(大约95%的时间是在我的沙桥机器上),"run_in_thread"功能完成了,但是直到我按Ctrl-C之后才调用"print_done"回调.

I wrote the example code below to illustrate the problem. Specifically, (about 95% of the time on my sandy bridge machine), the 'run_in_thread' function finishes, however the 'print_done' callback is not called until after I press Ctrl-C.

另外,我可以更改示例代码中的几项内容,以使此工作更可靠,例如:减少派生的进程数,从react_ready调用self.ready.set或更改deferLater的延迟.

Additionally, I can change several things in the example code to make this work more reliably such as: reducing the number of spawned processes, calling self.ready.set from reactor_ready, or changing the delay of deferLater.

我猜想在扭曲的反应堆和阻塞诸如Queue.get()或Event.wait()之类的多处理调用之间存在竞争状况吗?

I am guessing there is a race condition somewhere between the twisted reactor and blocking multiprocessing calls such as Queue.get() or Event.wait()?

我遇到的问题到底是什么?我的代码中是否存在我所缺少的错误?我可以解决这个问题,还是扭曲它与多处理事件/队列不兼容?

What exactly is the problem I am running into? Is there a bug in my code that I am missing? Can I fix this or is twisted incompatible with multiprocessing events/queues?

第二,推荐使用诸如spawnProcess或Ampoule之类的替代方法吗? (如混合了多处理功能的Python混合在一起?)

Secondly, would something like spawnProcess or Ampoule be the recommended alternative? (as suggested in Mix Python Twisted with multiprocessing?)

根据要求进行

我尝试过的所有反应堆都遇到问题glib2reactor selectreactor,polreactor和epollreactor. epollreactor似乎给出了最好的结果,并且在下面给出的示例中似乎可以很好地工作,但是在我的应用程序中仍然给我带来相同(或相似)的问题.我将继续调查.

I've run into problems with all the reactors I've tried glib2reactor selectreactor, pollreactor, and epollreactor. The epollreactor seems to give the best results and seems to work fine for the example given below but still gives me the same (or a similar) problem in my application. I will continue investigating.

我正在运行Gentoo Linux内核3.3和3.4,python 2.7,并且尝试了Twisted 10.2.0、11.0.0、11.1.0、12.0.0和12.1.0.

I'm running Gentoo Linux kernel 3.3 and 3.4, python 2.7, and I've tried Twisted 10.2.0, 11.0.0, 11.1.0, 12.0.0, and 12.1.0.

除了我的沙桥机器之外,我在双核AMD机器上也看到了相同的问题.

In addition to my sandy bridge machine, I see the same issue on my dual core amd machine.

#!/usr/bin/python
# -*- coding: utf-8 *-*

from twisted.internet import reactor
from twisted.internet import threads
from twisted.internet import task

from multiprocessing import Process
from multiprocessing import Event

class TestA(Process):
    def __init__(self):
        super(TestA, self).__init__()
        self.ready = Event()
        self.ready.clear()
        self.start()

    def run(self):
        reactor.callWhenRunning(self.reactor_ready)
        reactor.run()

    def reactor_ready(self, *args):
        task.deferLater(reactor, 1, self.node_ready)
        return args

    def node_ready(self, *args):
        print 'node_ready'
        self.ready.set()
        return args

def reactor_running():
    print 'reactor_running'
    df = threads.deferToThread(run_in_thread)
    df.addCallback(print_done)

def run_in_thread():
    print 'run_in_thread'
    for n in processes:
        n.ready.wait()

def print_done(dfResult=None):
    print 'print_done'
    reactor.stop()

if __name__ == '__main__':
    processes = [TestA() for i in range(8)]
    reactor.callWhenRunning(reactor_running)
    reactor.run()

推荐答案

简短的答案是肯定的,Twisted和多处理程序彼此不兼容,因此您无法可靠地使用它们.

The short answer is yes, Twisted and multiprocessing are not compatible with each other, and you cannot reliably use them as you are attempting to.

在所有POSIX平台上,子流程管理与SIGCHLD处理紧密相关. POSIX信号处理程序是进程全局的,每种信号类型只能有一个.

On all POSIX platforms, child process management is closely tied to SIGCHLD handling. POSIX signal handlers are process-global, and there can be only one per signal type.

Twisted和stdlib multiprocessing不能都安装SIGCHLD处理程序.其中只有一个可以.这意味着它们中只有一个可以可靠地管理子进程.您的示例应用程序无法控制它们中的哪一个将赢得该功能,因此我希望由该事实引起的行为不确定性.

Twisted and stdlib multiprocessing cannot both have a SIGCHLD handler installed. Only one of them can. That means only one of them can reliably manage child processes. Your example application doesn't control which of them will win that ability, so I would expect there to be some non-determinism in its behavior arising from that fact.

但是,您的示例更直接的问题是,您在父进程中加载​​了Twisted,然后使用multiprocessing派生了而不是执行所有子进程. Twisted不支持这样使用.如果先分叉然后执行,就没有问题.但是,缺少新进程的执行程序(也许是使用Twisted的Python进程)会导致Twisted无法解决的所有额外共享状态.在您的特定情况下,导致此问题的共享状态是用于实现deferToThread的内部"waker fd".在父级和所有子级之间共享fd的情况下,当父级尝试唤醒主线程以传递deferToThread调用的结果时,很可能会唤醒一个子进程 .子进程无用可做,因此只是浪费时间.同时,父线程中的主线程永远不会唤醒,也永远不会注意到线程任务已完成.

However, the more immediate problem with your example is that you load Twisted in the parent process and then use multiprocessing to fork and not exec all of the child processes. Twisted does not support being used like this. If you fork and then exec, there's no problem. However, the lack of an exec of a new process (perhaps a Python process using Twisted) leads to all kinds of extra shared state which Twisted does not account for. In your particular case, the shared state that causes this problem is the internal "waker fd" which is used to implement deferToThread. With the fd shared between the parent and all the children, when the parent tries to wake up the main thread to deliver the result of the deferToThread call, it most likely wakes up one of the child processes instead. The child process has nothing useful to do, so that's just a waste of time. Meanwhile the main thread in the parent never wakes up and never notices your threaded task is done.

有可能通过在创建子进程之前不加载任何Twisted来避免此问题.就Twisted而言,这会将您的用法转换为单进程用例(在每个进程中,将首先加载它,然后该进程将继续进行分叉,因此毫无疑问,fork和Twisted如何相互作用了).这意味着,直到创建子进程之后,才导入Twisted.

It's possible you can avoid this issue by not loading any of Twisted until you've already created the child processes. This would turn your usage into a single-process use case as far as Twisted is concerned (in each process, it would be initially loaded, and then that process would not go on to fork at all, so there's no question of how fork and Twisted interact anymore). This means not even importing Twisted until after you've created the child processes.

当然,这只会对Twisted有所帮助.您使用的任何其他库都可能遇到类似的麻烦(您提到了glib2,这是另一个库的一个很好的示例,如果您尝试像这样使用它,它将完全阻塞).

Of course, this only helps you out as far as Twisted goes. Any other libraries you use could run into similar trouble (you mentioned glib2, that's a great example of another library that will totally choke if you try to use it like this).

我强烈建议完全不使用multiprocessing模块.相反,请使用涉及fork exec的任何多进程方法,而不是单独使用fork.安瓿瓶属于这一类.

I highly recommend not using the multiprocessing module at all. Instead, use any multi-process approach that involves fork and exec, not fork alone. Ampoule falls into that category.

这篇关于与多处理事件和队列不兼容吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 16:09