问题描述
在py2.6 +中, multiprocessing 模块提供了一个 Pool 类,所以可以这样做:class不稳定(object):
def do_stuff(self,...):
pool = multiprocessing。 Pool()
return pool.imap(...)
然而,随着标准Python实现在2.7.2,这种方法很快就会导致IOError:[Errno 24]太多打开的文件。很明显, pool 对象永远不会被垃圾回收,所以它的进程永远不会终止,并累积内部打开的任何描述符。我认为这是因为以下工作:
$ p $ 类不稳定(对象):
def do_stuff(self,... ):
pool = multiprocessing.Pool()
result = pool.map(...)
pool.terminate()
返回结果
我希望保持 imap ;垃圾收集器在这种情况下如何工作?如何修复代码?
最后,我最终传递了池引用并在 pool.imap 迭代器完成后手动终止:
class Volatile(object):
def do_stuff(self,...):
pool = multiprocessing.Pool()
return pool,pool.imap(.. )
def call_stuff(self):
pool,results = self.do_stuff()
表示结果的结果:
#懒惰评估imap
pool.terminate()
如果有人将来会碰到这个解决方案:chunksize参数在 Pool.imap 中非常重要(与普通的 Pool .map ,这并不重要)。我手动设置它,以便每个进程接收 1 + len(输入)/ len(池)作业。将它保留为默认值 chunksize = 1 给了我相同的性能,就好像我根本不使用并行处理一样...... bad。
我想用订购的 imap 与订购的 map 并没有真正的好处,我只是个人喜欢迭代器更好。
In py2.6+, the multiprocessing module offers a Pool class, so one can do:
class Volatile(object): def do_stuff(self, ...): pool = multiprocessing.Pool() return pool.imap(...)
However, with the standard Python implementation at 2.7.2, this approach soon leads to "IOError: [Errno 24] Too many open files". Apparently the pool object never gets garbage collected, so its processes never terminate, accumulating whatever descriptors are opened internally. I think this because the following works:
class Volatile(object): def do_stuff(self, ...): pool = multiprocessing.Pool() result = pool.map(...) pool.terminate() return result
I would like to keep the "lazy" iterator approach of imap; how does the garbage collector work in that case? How to fix the code?
In the end, I ended up passing the pool reference around and terminating it manually once the pool.imap iterator was finished:
class Volatile(object): def do_stuff(self, ...): pool = multiprocessing.Pool() return pool, pool.imap(...) def call_stuff(self): pool, results = self.do_stuff() for result in results: # lazy evaluation of the imap pool.terminate()
In case anyone stumbles upon this solution in the future: the chunksize parameter is very important in Pool.imap (as opposed to plain Pool.map, where it didn't matter). I manually set it so that each process receives 1 + len(input) / len(pool) jobs. Leaving it to the default chunksize=1 gave me the same performance as if I didn't use parallel processing at all... bad.
I guess there's no real benefit to using ordered imap vs. ordered map, I just personally like iterators better.
这篇关于多处理和垃圾收集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!