在嵌套循环中使用multiprocessor

在嵌套循环中使用multiprocessor

本文介绍了在嵌套循环中使用multiprocessor.Pool的正确方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用multiprocessor.Pool()模块来加速令人尴尬的并行"循环.我实际上有一个嵌套循环,并且正在使用multiprocessor.Pool来加速内部循环.例如,在不并行化循环的情况下,我的代码如下:

I am using the multiprocessor.Pool() module to speed up an "embarrassingly parallel" loop. I actually have a nested loop, and am using multiprocessor.Pool to speed up the inner loop. For example, without parallelizing the loop, my code would be as follows:

outer_array=[random_array1]
inner_array=[random_array2]
output=[empty_array]

for i in outer_array:
    for j in inner_array:
        output[j][i]=full_func(j,i)

具有并行化功能:

import multiprocessing
from functools import partial

outer_array=[random_array1]
inner_array=[random_array2]
output=[empty_array]

for i in outer_array:
    partial_func=partial(full_func,arg=i)
    pool=multiprocessing.Pool()
    output[:][i]=pool.map(partial_func,inner_array)
    pool.close()

我的主要问题是这是否正确,我应该在循环内包含multiprocessing.Pool(),或者是否应该在循环外创建池,即:

My main question is if this is the correct, and I should be including the multiprocessing.Pool() inside the loop, or if instead I should create the pool outside loop, i.e.:

pool=multiprocessing.Pool()
for i in outer_array:
     partial_func=partial(full_func,arg=i)
     output[:][i]=pool.map(partial_func,inner_array)

此外,我不确定是否在上面的第二个示例的每个循环末尾添加"pool.close()"行;这样做有什么好处?

Also, I am not sure if I should include the line "pool.close()" at the end of each loop in the second example above; what would be the benefits of doing so?

谢谢!

推荐答案

理想情况下,您应该只调用一次Pool()构造函数-不能超过&再次.创建工作进程时会产生大量开销,并且每次调用Pool()时都要支付这些费用.单个Pool()调用创建的进程始终存在!当他们完成您在计划的一部分中提供给他们的工作时,他们会留下来,等待更多的工作.

Ideally, you should call the Pool() constructor exactly once - not over & over again. There are substantial overheads when creating worker processes, and you pay those costs every time you invoke Pool(). The processes created by a single Pool() call stay around! When they finish the work you've given to them in one part of the program, they stick around, waiting for more work to do.

对于Pool.close(),应该在-且仅在-永远不会向Pool实例提交更多工作时调用它.因此,通常在主程序的可并行化部分完成时调用Pool.close().然后,当所有已分配的工作完成时,工作进程将终止.

As to Pool.close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.

调用Pool.join()等待工作进程终止也是一种很好的做法.除其他原因外,通常没有很好的方法来报告并行代码中的异常(异常仅在与您的主程序正在执行的上下文有关的上下文中发生),并且Pool.join()提供了一个同步点,可以报告在worker中发生的某些异常否则您将无法看到的流程.

It's also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there's often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you'd otherwise never see.

玩得开心:-)

这篇关于在嵌套循环中使用multiprocessor.Pool的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 17:20