问题描述
是否有一种简单的方法可以跟踪 joblib.Parallel 执行的整体进度?>
我有一个由数千个作业组成的长时间运行的执行,我想在数据库中跟踪和记录这些作业.然而,要做到这一点,每当 Parallel 完成一项任务时,我需要它执行一个回调,报告还剩下多少作业.
我之前使用 Python 的 stdlib multiprocessing.Pool 完成了类似的任务,方法是启动一个线程来记录 Pool 的作业列表中的待处理作业数.
看代码,Parallel继承了Pool,所以我以为我可以拉出同样的技巧,但它似乎没有使用列表中的这些,我一直无法弄清楚如何阅读" 任何其他方式都是内部状态.
为什么不能简单地使用 tqdm
?以下对我有用
from joblib import 并行,延迟从日期时间导入日期时间从 tqdm 导入 tqdmdef myfun(x):返回 x**2结果 = Parallel(n_jobs=8)(delayed(myfun)(i) for i in tqdm(range(1000))100%|██████████|1000/1000 [00:00
Is there a simple way to track the overall progress of a joblib.Parallel execution?
I have a long-running execution composed of thousands of jobs, which I want to track and record in a database. However, to do that, whenever Parallel finishes a task, I need it to execute a callback, reporting how many remaining jobs are left.
I've accomplished a similar task before with Python's stdlib multiprocessing.Pool, by launching a thread that records the number of pending jobs in Pool's job list.
Looking at the code, Parallel inherits Pool, so I thought I could pull off the same trick, but it doesn't seem to use these that list, and I haven't been able to figure out how else to "read" it's internal status any other way.
Why can't you simply use tqdm
? The following worked for me
from joblib import Parallel, delayed
from datetime import datetime
from tqdm import tqdm
def myfun(x):
return x**2
results = Parallel(n_jobs=8)(delayed(myfun)(i) for i in tqdm(range(1000))
100%|██████████| 1000/1000 [00:00<00:00, 10563.37it/s]
这篇关于跟踪joblib.Parallel执行的进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!