默认的multiprocessing.Pool代码包括一个计数器,用于跟踪工作人员已完成的任务数:

    completed += 1
logging.debug('worker exiting after %d tasks' % completed)


但是,将range(12)range(20)升至pool.map会导致计数器错误(这似乎与创建工作程序无关)。我也不清楚是什么原因造成的。

例如:

import multiprocessing as mp

def ret_x(x):
    return x
def inform():
    print('made a worker!')
pool  = mp.Pool(2, maxtasksperchild=2, initializer=inform)
res= pool.map(ret_x, range(8))
print(res)


可以正常工作,给出:

made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
[0, 1, 2, 3, 4, 5, 6, 7]


但是,将range更改为20不会显示正在创建任何其他工作程序,也不会显示总共20个已完成的任务,即使已完成的范围已在预期列表中返回。

made a worker!
made a worker!
worker exiting after 2 tasks
worker exiting after 2 tasks
made a worker!
worker exiting after 2 tasks
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
worker exiting after 1 tasks

最佳答案

之所以这样工作是因为您没有在pool.map中明确定义“ chunksize”:

map(func, iterable[, chunksize])



  此方法将迭代器切成许多块,
  作为单独的任务提交到流程池。 (大约)大小
  这些块中的一个可以通过将chunksize设置为正数来指定
  整数


资料来源:https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool

对于8个项目,考虑len(pool)= 2,chunksize将为1(divmod(8,2 * 4)),因此您看到(8/1)/ 2工人= 4工人

workers = (len of items / chunksize) /  tasks per process


对于20个项目,考虑len(pool)= 2,chunksize将为3(divmode(20,2 * 4)),因此您会看到类似(20/3)/ 2 = 3.3 worker

对于40 ... chunksize = 5,工人=(40/5)/ 5 = 4个工人

如果需要,可以设置chunksize = 1

res = pool.map(ret_x, range(40), 1)


您将看到(20/1)/ 2 = 10个工人

python mppp.py
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
made a worker!
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


因此,chunksize就像一个流程的单位工作量……之类。

如何计算chunksize:https://hg.python.org/cpython/file/1c54def5947c/Lib/multiprocessing/pool.py#l305

关于python - Python:多进程工作程序,跟踪任务完成情况(缺少完成情况),我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28101232/

10-11 16:15