问题描述
我需要运行一个令人尴尬的并行 for 循环.快速搜索后,我找到了python的包joblib.我做了一个简单的测试,如软件包网站上发布的那样.这是测试
I need to run an embarrassingly parallel for loop. After a quick search, I found package joblib for python. I did a simple test as posted on the package's website. Here is the test
from math import sqrt
from joblib import Parallel, delayed
import multiprocessing
%timeit [sqrt(i ** 2) for i in range(10)]
result: 3.89 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
num_cores = multiprocessing.cpu_count()
%timeit Parallel(n_jobs=num_cores)(delayed(sqrt)(i ** 2) for i in range(10))
result: 600 ms ± 40 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
如果我正确理解结果,使用 joblib 不仅会提高速度,还会使速度变慢?我在这里错过了什么吗,谢谢
If I understand the results correctly, using the joblib does not only increase the speed but make in it slower? Did I miss something here, Thank you
推荐答案
Joblib
创建新进程来运行要并行执行的函数.但是,创建进程可能需要一些时间(大约 500 毫秒),尤其是现在 joblib 使用 spawn
来创建新进程(而不是 fork
).
Joblib
creates new processes to run the functions you want to execute in parallel. However, creating processes can take some time (around 500ms), especially now that joblib uses spawn
to create new processes (and not fork
).
因为你要并行运行的函数运行起来非常快,所以这里%timeit
的结果多半是显示了进程创建的开销.如果您选择的函数在与启动新进程所需的时间相比不可忽略的时间内运行,您将看到性能的一些改进:
Because the function you want to run in parallel is very fast to run, the result of %timeit
here mostly shows the overhead of process creation. If you choose a function that runs during a time that is not negligible compared to the time required to start new processes, you will see some improvements in performance:
您可以运行以下示例进行测试:
Here is a sample you can run to test this:
import time
import joblib
from joblib import Parallel, delayed
def f(x):
time.sleep(1)
return x
def bench_joblib(n_jobs):
start_time = time.time()
Parallel(n_jobs=n_jobs)(delayed(f)(x) for x in range(4))
print('running 4 times f using n_jobs = {} : {:.2f}s'.format(
n_jobs, time.time()-start_time))
if __name__ == "__main__":
bench_joblib(1)
bench_joblib(4)
我得到了,使用 python 3.7 和 joblib 0.12.5
I got, using python 3.7 and joblib 0.12.5
running 4 times f using n_jobs = 1 : 4.01s
running 4 times f using n_jobs = 4 : 1.34s
这篇关于Python joblib 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!