问题描述
我有这个非常简单的python代码,我想通过并行化来加快速度.但是,无论我做什么,在标准地图上multiprocessing.Pool.map
都不会获得任何好处.
I have this very simple python code that I want to speed up by parallelizing it. However no matter what I seem to do, multiprocessing.Pool.map
doesn't gain anything over the standard map.
我读过其他线程,人们在使用这些线程时会使用非常小的函数,这些函数无法很好地并行化并导致过多的开销,但是我认为在这种情况下应该不是这种情况.
I've read other threads where people use this with very small functions that don't parallelize well and lead to excessive overhead but I would think that shouldn't be the case here.
我做错什么了吗?
这是例子
#!/usr/bin/python
import numpy, time
def AddNoise(sample):
#time.sleep(0.001)
return sample + numpy.random.randint(0,9,sample.shape)
#return sample + numpy.ones(sample.shape)
n=100
m=10000
start = time.time()
A = list([ numpy.random.randint(0,9,(n,n)) for i in range(m) ])
print("creating %d numpy arrays of %d x %d took %.2f seconds"%(m,n,n,time.time()-start))
for i in range(3):
start = time.time()
A = list(map(AddNoise, A))
print("adding numpy arrays took %.2f seconds"%(time.time()-start))
for i in range(3):
import multiprocessing
start = time.time()
with multiprocessing.Pool(processes=2) as pool:
A = list(pool.map(AddNoise, A, chunksize=100))
print("adding numpy arrays with multiprocessing Pool took %.2f seconds"%(time.time()-start))
for i in range(3):
import concurrent.futures
start = time.time()
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
A = list(executor.map(AddNoise, A))
print("adding numpy arrays with concurrent.futures.ProcessPoolExecutor took %.2f seconds"%(time.time()-start))
这将在我的4核/8线程笔记本电脑上产生以下输出,否则将处于空闲状态
Which results in the following output on my 4-core/8-thread laptop, which is idle otherwise
$ python test-pool.py
creating 10000 numpy arrays of 100 x 100 took 1.54 seconds
adding numpy arrays took 1.65 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays with multiprocessing Pool took 1.99 seconds
adding numpy arrays with multiprocessing Pool took 1.98 seconds
adding numpy arrays with multiprocessing Pool took 1.94 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.32 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.17 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.25 seconds
推荐答案
问题出在结果传输中.考虑到通过多处理,您需要将在子进程内部创建的数组转移回主进程..这是一项开销.
The problem is in the result transfer. Consider that with multiprocessing the arrays you create inside the child processes need to be transferred back to the main process.. and this is an overhead.
我检查了此修改AddNoise函数的方式,该方法保留了计算时间,但放弃了传输时间:
I checked this modifying the AddNoise function in this way, which preserve the computation time, but discard the transfer time:
def AddNoise(sample):
sample + numpy.random.randint(0,9,sample.shape)
return None
这篇关于与串行映射相比,为什么多处理池映射不加快速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!