跟踪joblib.Parallel执行的进度

本文介绍了跟踪joblib.Parallel执行的进度的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有一种简单的方法可以跟踪 joblib.Parallel 执行的整体进度?>

我有一个由数千个作业组成的长时间运行的执行，我想在数据库中跟踪和记录这些作业.然而，要做到这一点，每当 Parallel 完成一项任务时，我需要它执行一个回调，报告还剩下多少作业.

我之前使用 Python 的 stdlib multiprocessing.Pool 完成了类似的任务，方法是启动一个线程来记录 Pool 的作业列表中的待处理作业数.

看代码，Parallel继承了Pool，所以我以为我可以拉出同样的技巧，但它似乎没有使用列表中的这些，我一直无法弄清楚如何阅读" 任何其他方式都是内部状态.

解决方案

为什么不能简单地使用 tqdm?以下对我有用

from joblib import 并行，延迟从日期时间导入日期时间从 tqdm 导入 tqdmdef myfun(x):返回 x**2结果 = Parallel(n_jobs=8)(delayed(myfun)(i) for i in tqdm(range(1000))100%|██████████|1000/1000 [00:00

Is there a simple way to track the overall progress of a joblib.Parallel execution?

I have a long-running execution composed of thousands of jobs, which I want to track and record in a database. However, to do that, whenever Parallel finishes a task, I need it to execute a callback, reporting how many remaining jobs are left.

I've accomplished a similar task before with Python's stdlib multiprocessing.Pool, by launching a thread that records the number of pending jobs in Pool's job list.

Looking at the code, Parallel inherits Pool, so I thought I could pull off the same trick, but it doesn't seem to use these that list, and I haven't been able to figure out how else to "read" it's internal status any other way.

解决方案

Why can't you simply use tqdm? The following worked for me

from joblib import Parallel, delayed
from datetime import datetime
from tqdm import tqdm

def myfun(x):
    return x**2

results = Parallel(n_jobs=8)(delayed(myfun)(i) for i in tqdm(range(1000))
100%|██████████| 1000/1000 [00:00<00:00, 10563.37it/s]

这篇关于跟踪joblib.Parallel执行的进度的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！