本文介绍了pandas.algos._return_false在CentOS上导致带有dill.dump_session的PicklingError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个代码框架,涉及到带有莳萝的转储会话.在我开始使用熊猫之前,这种方法一直都可以正常工作.以下代码在CentOS 6.5版上引发了PicklingError:

I have a code framework which involves dumping sessions with dill. This used to work just fine, until I started to use pandas. The following code raises a PicklingError on CentOS release 6.5:

import pandas
import dill
dill.dump_session('x.dat')

问题似乎源于pandas.algos.实际上,只需运行此命令即可重现错误:

The problem seems to stem from pandas.algos. In fact, it's enough to run this to reproduce the error:

import pandas.algos
import dill
dill.dump_session('x.dat') / dill.dumps(pandas.algos)

错误为pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x1df3050>: it's not found as pandas.algos.lambda1.

问题是,此错误未在我的PC上引发.它们都具有相同版本的pandas(0.14.1),dill(0.2.1)和python(2.7.6).

The thing is, this error is not raised on my pc. Both of them have same versions of pandas (0.14.1), dill (0.2.1), and python (2.7.6).

看坏物体,我得到:

>>> dill.detect.badobjects(pandas.algos, depth = 1)
{'__builtins__': <module '__builtin__' (built-in)>, 
'_return_true': <cyfunction lambda2 at 0x1484d70>, 
'np': <module 'numpy' from '/usr/local/lib/python2.7/site-packages/numpy-1.8.2-py2.7-linux-x86_64.egg/numpy/__init__.pyc'>, 
'_return_false': <cyfunction lambda1 at 0x1484cc8>, 
'lib': <module 'pandas.lib' from '/home/talkr/.local/lib/python2.7/site-packages/pandas/lib.so'>}

这似乎是由于两个OS-(可能是不同的编译器)对pandas.algos的不同处理.在我的PC上,dump_session没有错误,pandas.algos._return_false<cyfunction <lambda> at 0x06DD02A0>,而在CentOS上是<cyfunction lambda1 at 0x1df3050>.为什么处理方式不同?

This seems to be due to different handling of pandas.algos by the two OS-s (perhaps different compilers?). On my PC, where dump_session is without errors, pandas.algos._return_false is <cyfunction <lambda> at 0x06DD02A0>, while on CentOS it's <cyfunction lambda1 at 0x1df3050>. Why is it handled differently?

推荐答案

我看不到您在Mac上看到的内容.这是我使用相同版本的pandas看到的.我确实看到您正在使用dill的不同版本.我正在使用来自github的版本.我将检查是否在dill中保存了模块或全局变量,这可能会对某些发行版产生影响.

I'm not seeing what you are seeing on a mac. Here's what I see, using the same version of pandas. I do see that you are using a different version of dill. I'm using the version from github. I'll check if there was a tweak to saving modules or globals in dill that might have had that impact on some distros.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> import dill
>>> dill.detect.trace(True)
>>> dill.dump_session('x.pkl')
M1: <module '__main__' (built-in)>
F2: <function _import_module at 0x1069ff140>
D2: <dict object at 0x106a0b280>
M2: <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/__init__.pyc'>
M2: <module 'pandas' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/__init__.pyc'>

这是我得到的pandas.algos

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.algos
>>> import dill
>>> dill.dumps(pandas.algos)
'\x80\x02cdill.dill\n_import_module\nq\x00U\x0cpandas.algosq\x01\x85q\x02Rq\x03.'

这是pandas.algos._return_false我得到的:

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import pandas.algos
>>> dill.dumps(pandas.algos._return_false)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 180, in dumps
    dump(obj, file, protocol, byref, file_mode, safeio)
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 173, in dump
    pik.dump(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 317, in save
    self.save_global(obj, rv)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x10d403cc8>: it's not found as pandas.algos.lambda1

因此,我现在可以重现您的错误.

So, I can now reproduce your error.

根据其构建方式,它看起来像是不可拾取的对象. 但是,应该可以在模块内对其进行腌制……对我来说就是这样.您似乎已经查明了在CentOS上构建的熊猫对象中所看到的区别.

This looks like an unpicklable object, based on how it's built. However, it should be able to be pickled inside the module… as it is for me. You seem to have pinpointed the difference between what you are seeing in the object pandas builds on CentOS.

pandas代码库中,pandas.algos是一个pyx文件…因此是cython.这是代码.

Looking at the pandas codebase, pandas.algos is a pyx file… so that's cython.And here's the code.

_return_false = lambda self, other: False

.py文件中,我知道它将序列化.我不知道dill如何用于cython生成的lambda……(例如lambda cyfunction).

Were that in a .py file, I know it would serialize. I have no idea how dill works for cython generated lambdas… (e.g. a lambda cyfunction).

好像有一个提交( https://github.com/pydata/pandas/commit/73c71dfca10012e25c829930508b5d6f7ccad5ff),其中_return_false被移到类之外到模块范围内.您在CentOS和PC上都看到了吗?可能是因为不同发行版的v0.14.1截断了略有不同的git版本,具体取决于您安装熊猫的方式.

It looks like there was a commit (https://github.com/pydata/pandas/commit/73c71dfca10012e25c829930508b5d6f7ccad5ff) in which _return_false was moved outside a class into the module scope. Do you see that on both CentOS and your PC? It may be that the v0.14.1 for different distros was cut off slightly different git versions… depending on how you installed pandas.

因此,显然,我可以通过尝试获取对象的来源来拾取lambda1 ...对于lambda而言,如果无法获取该对象,则dill将按名称进行抓取...并且显然将其命名为lambda1…即使.pyx文件中没有显示.也许是由于cython如何构建lambda.

So apparently, I can pick up a lambda1 by trying to get the source of the object… which for lambda, if it can't get the source, dill will grab by name… and apparently it's named lambda1… even though that doesn't show up in the .pyx file. Maybe it's due to how cython builds the lambdas.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.algos
>>> import dill
>>> dill.source.importable(pandas.algos._return_false)
'from pandas import lambda1\n'

差异可能来自cython…,因为代码是从pandas中的.pyx生成的.您的cython版本是什么?我的是0.20.2.

The difference might be coming from cython… since the code is generated from a .pyx in pandas. What's your versions of cython? Mine is 0.20.2.

这篇关于pandas.algos._return_false在CentOS上导致带有dill.dump_session的PicklingError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 18:35