本文介绍了 pandas 记忆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我有很长的计算,需要重复很多次.因此,我想使用记忆(例如 jug joblib ),与 Pandas 配合使用.问题在于该程序包是否可以很好地记住Pandas DataFrames作为方法参数.

I have lengthy computations which I repeat many times. Therefore, I would like to use memoization (packages such as jug and joblib), in concert with Pandas. The problem is whether the package would memoize well Pandas DataFrames as method arguments.


Has anyone tried it? Is there any other recommended package/way to do this?



Author of jug here: jug works fine. I just tried the following and it works:

from jug import TaskGenerator
import pandas as pd
import numpy as np

def gendata():
    return pd.DataFrame(np.arange(343440).reshape((10,-1)))

def compute(x):
    return x.mean()

y = compute(gendata())


It is not as efficient as it could be as it just uses pickle internally for the DataFrame (although it compresses it on the fly, so it is not horrible in terms of memory use; just slower than it could be).


I would be open to a change which saves these as a special case as jug currently does for numpy arrays: https://github.com/luispedro/jug/blob/master/jug/backends/file_store.py#L102

这篇关于 pandas 记忆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-26 18:05