使用块的多处理不适用于

使用块的多处理不适用于

本文介绍了使用块的多处理不适用于 predict_proba的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在没有多处理的数据帧上运行 predict_proba 时,我得到了预期的行为.代码如下:

When I run predict_proba on a dataframe without multiprocessing I get the expected behavior. The code is as follows:

probabilities_data = classname.perform_model_prob_predictions_nc(prediction_model, vectorized_data)

其中:perform_model_prob_predictions_nc 是:

def perform_model_prob_predictions_nc(model, dataFrame):
    try:
        return model.predict_proba(dataFrame)
    except AttributeError:
        logging.error("AttributeError occurred",exc_info=True)

但是当我尝试使用块和多处理运行相同的函数时:

But when I try to run the same function using chunks and multiprocessing:

probabilities_data = classname.perform_model_prob_predictions(prediction_model, chunks, cores)

其中 perform_model_prob_predictions 是:

def perform_model_prob_predictions(model, dataFrame, cores=4):
    try:
        with Pool(processes=cores) as pool:
            result = pool.map(model.predict_proba, dataFrame)
            return result
    except Exception:
        logging.error("Error occurred", exc_info=True)

我收到以下错误:

PicklingError: Can't pickle :它与 sklearn.multiclass.OneVsRestClassifier.predict_proba 不是同一个对象

参考:

cores = 4
vectorized_data = pd.DataFrame(...)
chunk_size = len(vectorized_data) // cores + cores
chunks = [df_chunk for g, df_chunk in vectorized_data.groupby(np.arange(len(vectorized_data)) // chunk_size)]

推荐答案

Pool 在内部使用 Queue,任何进入那里的东西都需要被腌制.该错误告诉您 PicklingError: Can't pickle 不能被pickle.

Pool internally uses Queue and anything that goes there needs to be pickled. The error tells you that PicklingError: Can't pickle <function OneVsRestClassifier.predict_proba cannot be pickled.

您有多种选择,其中一些在 这篇 SO 帖子中进行了描述.另一种选择是使用 joblibloky 后端.后者使用 cloudpickle 允许序列化默认 pickle 不支持的结构.

You have several options, some are described in this SO post. Another option is to use joblib with loky backend. The latter uses cloudpickle that allows for serialisation of constructs not supported by default pickle.

代码看起来或多或少是这样的:

The code will look more or less like this:

from joblib import Parallel, delayed

Parallel(n_jobs=4, backend='loky')(delayed(model.predict_proba)(dataFrame=dataFrame) for chunk in chunks)

请注意,对对象进行经典酸洗这种方法通常不是健康的想法.dill 在这里可以很好地工作.

Mind that classic pickling such methods on objects is in general not healthy idea. dill could work here well.

这篇关于使用块的多处理不适用于 predict_proba的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 06:02