问题描述
下面有一段代码,其中 joblib.Parallel() 返回一个列表.
I have a piece of code below where the joblib.Parallel() returns a list.
import numpy as np from joblib import Parallel, delayed lst = [[0.0, 1, 2], [3, 4, 5], [6, 7, 8]] arr = np.array(lst) w, v = np.linalg.eigh(arr) def proj_func(i): return np.dot(v[:,i].reshape(-1, 1), v[:,i].reshape(1, -1)) proj = Parallel(n_jobs=-1)(delayed(proj_func)(i) for i in range(len(w)))
如何使用列表joblib.Parallel()返回生成器?
Instead of a list, how do I return a generator using joblib.Parallel()?
我已经更新了@ user3666197在下面的注释中建议的代码.
I have updated the code as suggested by @user3666197 in comments below.
import numpy as np from joblib import Parallel, delayed lst = [[0.0, 1, 2], [3, 4, 5], [6, 7, 8]] arr = np.array(lst) w, v = np.linalg.eigh(arr) def proj_func(i): yield np.dot(v[:,i].reshape(-1, 1), v[:,i].reshape(1, -1)) proj = Parallel(n_jobs=-1)(delayed(proj_func)(i) for i in range(len(w)))
但是我收到此错误:
TypeError: can't pickle generator objects
我错过了什么吗?我该如何解决?我的主要收获是减少内存,因为proj会变得非常大,所以我只想一次调用列表中的每个生成器.
Am I missing something? How do I fix this? My main gain here is to reduce memory as proj can get very large, so I would just like to call each generator in the list one at a time.
推荐答案
鉴于 joblib 的目的和实现,着重于使用一组衍生的独立进程来分发代码执行单元(是的,其动力来自从中央GIL逃脱而来的提高的性能语法构造函数称为 joblib.Parallel(...)( delayed()(...) )的 -lock re- [SERIAL]跳舞一个GIL-step-after-另一个GIL-step-after -... ,我的想象力显然有限,他告诉我,最大可实现的方法是使 远程" 执行的过程返回到所请求的生成器中.将joblib组装(不受控制)到列表中的em>.
Given the joblib purpose and implementation, focused on distributing code-execution units, using a set of spawned, independent processes ( yes, motivated by a boosted performance from an escape from a central GIL-lock re-[SERIAL]-ised dancing one-GIL-step-after-another-GIL-step-after-... ) made by the syntactic constructor known as joblib.Parallel(...)( delayed()(...) ), my, obviously limited imagination, tells me, the maximum achievable is but to make the "remotely" executed processes to return back to main the requested generator(s) that are joblib-assembled ( out of one's control ) into a list.
因此,在上述初始条件和给定功能的前提下,可以实现的最大值是接收生成器列表,而不是任何形式的延迟执行,并在返回时作为生成器进行包装strong> fun() ,设置为通过 delayed( fun )(...) 注入到joblib.Parallel( n_jobs = ... )-许多 远程" -进程的确会这样做.
So an achievable maximum is to receive a list of generators, not any form of a deferred-execution, wrapped on return as a generator, given the above set of initial conditions and given the function fun(), set to be injected via the delayed( fun )(...) into the joblib.Parallel( n_jobs = ... )-many "remote"-processes, will indeed do so.
这篇关于如何使用joblib.Parallel()返回生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!