问题描述
我想从本示例开始尝试使用multiprocessing
的不同方式:
I wanted to try different ways of using multiprocessing
starting with this example:
$ cat multi_bad.py
import multiprocessing as mp
from time import sleep
from random import randint
def f(l, t):
# sleep(30)
return sum(x < t for x in l)
if __name__ == '__main__':
l = [randint(1, 1000) for _ in range(25000)]
t = [randint(1, 1000) for _ in range(4)]
# sleep(15)
pool = mp.Pool(processes=4)
result = pool.starmap_async(f, [(l, x) for x in t])
print(result.get())
在这里,l
是当产生4个进程时被复制4次的列表.为了避免这种情况,文档页面提供了使用队列,共享数组或使用multiprocessing.Manager
创建的代理对象的信息.对于最后一个,我更改了l
的定义:
Here, l
is a list that gets copied 4 times when 4 processes are spawned. To avoid that, the documentation page offers using queues, shared arrays or proxy objects created using multiprocessing.Manager
. For the last one, I changed the definition of l
:
$ diff multi_bad.py multi_good.py
10c10,11
< l = [randint(1, 1000) for _ in range(25000)]
---
> man = mp.Manager()
> l = man.list([randint(1, 1000) for _ in range(25000)])
结果看起来仍然正确,但是执行时间却大大增加,以至于我做错了事情:
The results still look correct, but the execution time has increased so dramatically that I think I'm doing something wrong:
$ time python multi_bad.py
[17867, 11103, 2021, 17918]
real 0m0.247s
user 0m0.183s
sys 0m0.010s
$ time python multi_good.py
[3609, 20277, 7799, 24262]
real 0m15.108s
user 0m28.092s
sys 0m6.320s
文档确实说这种方法比共享数组要慢,但是这感觉很不对.我也不确定如何才能对此进行分析,以获取有关正在发生的事情的更多信息.我想念什么吗?
The docs do say that this way is slower than shared arrays, but this just feels wrong. I'm also not sure how I can profile this to get more information on what's going on. Am I missing something?
P.S.使用共享数组,我得到的时间低于0.25s.
P.S. With shared arrays I get times below 0.25s.
P.P.S.这是在Linux和Python 3.3上.
P.P.S. This is on Linux and Python 3.3.
推荐答案
Linux使用复制- os.fork
子进程时,写时.演示:
Linux uses copy-on-write when subprocesses are os.fork
ed. To demonstrate:
import multiprocessing as mp
import numpy as np
import logging
import os
logger = mp.log_to_stderr(logging.WARNING)
def free_memory():
total = 0
with open('/proc/meminfo', 'r') as f:
for line in f:
line = line.strip()
if any(line.startswith(field) for field in ('MemFree', 'Buffers', 'Cached')):
field, amount, unit = line.split()
amount = int(amount)
if unit != 'kB':
raise ValueError(
'Unknown unit {u!r} in /proc/meminfo'.format(u = unit))
total += amount
return total
def worker(i):
x = data[i,:].sum() # Exercise access to data
logger.warn('Free memory: {m}'.format(m = free_memory()))
def main():
procs = [mp.Process(target = worker, args = (i, )) for i in range(4)]
for proc in procs:
proc.start()
for proc in procs:
proc.join()
logger.warn('Initial free: {m}'.format(m = free_memory()))
N = 15000
data = np.ones((N,N))
logger.warn('After allocating data: {m}'.format(m = free_memory()))
if __name__ == '__main__':
main()
产生了
[WARNING/MainProcess] Initial free: 2522340
[WARNING/MainProcess] After allocating data: 763248
[WARNING/Process-1] Free memory: 760852
[WARNING/Process-2] Free memory: 757652
[WARNING/Process-3] Free memory: 757264
[WARNING/Process-4] Free memory: 756760
这表明最初大约有2.5GB的可用内存.分配float64
的15000x15000数组后,有763248 KB可用空间.因为15000 ** 2 * 8个字节= 1.8GB,而内存的下降2.5GB-0.763248GB也大约是1.8GB,所以这大致是合理的.
This shows that initially there was roughly 2.5GB of free memory.After allocating a 15000x15000 array of float64
s, there was 763248 KB free. This roughly makes sense since 15000**2*8 bytes = 1.8GB and the drop in memory, 2.5GB - 0.763248GB is also roughly 1.8GB.
现在,在生成每个进程之后,再次报告空闲内存为〜750MB.可用内存没有明显减少,因此我得出结论,系统必须使用写时复制.
Now after each process is spawned, the free memory is again reported to be ~750MB. There is no significant decrease in free memory, so I conclude the system must be using copy-on-write.
结论:如果不需要修改数据,则在__main__
模块的全局级别上定义数据是一种方便的(至少在Linux上是)内存友好的方式,可以在子进程之间共享数据.
Conclusion: If you do not need to modify the data, defining it at the global level of the __main__
module is a convenient and (at least on Linux) memory-friendly way to share it among subprocesses.
这篇关于使用multiprocessing.Manager.list而不是真实列表会使计算耗时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!