

我有一个代码框架,涉及到带有莳萝的转储会话.在我开始使用熊猫之前,这种方法一直都可以正常工作.以下代码在CentOS 6.5版上引发了PicklingError:

I have a code framework which involves dumping sessions with dill. This used to work just fine, until I started to use pandas. The following code raises a PicklingError on CentOS release 6.5:

import pandas
import dill


The problem seems to stem from pandas.algos. In fact, it's enough to run this to reproduce the error:

import pandas.algos
import dill
dill.dump_session('x.dat') / dill.dumps(pandas.algos)

错误为pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x1df3050>: it's not found as pandas.algos.lambda1.


The thing is, this error is not raised on my pc. Both of them have same versions of pandas (0.14.1), dill (0.2.1), and python (2.7.6).


>>> dill.detect.badobjects(pandas.algos, depth = 1)
{'__builtins__': <module '__builtin__' (built-in)>, 
'_return_true': <cyfunction lambda2 at 0x1484d70>, 
'np': <module 'numpy' from '/usr/local/lib/python2.7/site-packages/numpy-1.8.2-py2.7-linux-x86_64.egg/numpy/__init__.pyc'>, 
'_return_false': <cyfunction lambda1 at 0x1484cc8>, 
'lib': <module 'pandas.lib' from '/home/talkr/.local/lib/python2.7/site-packages/pandas/lib.so'>}

这似乎是由于两个OS-(可能是不同的编译器)对pandas.algos的不同处理.在我的PC上,dump_session没有错误,pandas.algos._return_false<cyfunction <lambda> at 0x06DD02A0>,而在CentOS上是<cyfunction lambda1 at 0x1df3050>.为什么处理方式不同?

This seems to be due to different handling of pandas.algos by the two OS-s (perhaps different compilers?). On my PC, where dump_session is without errors, pandas.algos._return_false is <cyfunction <lambda> at 0x06DD02A0>, while on CentOS it's <cyfunction lambda1 at 0x1df3050>. Why is it handled differently?



I'm not seeing what you are seeing on a mac. Here's what I see, using the same version of pandas. I do see that you are using a different version of dill. I'm using the version from github. I'll check if there was a tweak to saving modules or globals in dill that might have had that impact on some distros.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> import dill
>>> dill.detect.trace(True)
>>> dill.dump_session('x.pkl')
M1: <module '__main__' (built-in)>
F2: <function _import_module at 0x1069ff140>
D2: <dict object at 0x106a0b280>
M2: <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/__init__.pyc'>
M2: <module 'pandas' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/__init__.pyc'>


Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.algos
>>> import dill
>>> dill.dumps(pandas.algos)


Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> import pandas.algos
>>> dill.dumps(pandas.algos._return_false)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 180, in dumps
    dump(obj, file, protocol, byref, file_mode, safeio)
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.py", line 173, in dump
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 317, in save
    self.save_global(obj, rv)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x10d403cc8>: it's not found as pandas.algos.lambda1


So, I can now reproduce your error.

根据其构建方式,它看起来像是不可拾取的对象. 但是,应该可以在模块内对其进行腌制……对我来说就是这样.您似乎已经查明了在CentOS上构建的熊猫对象中所看到的区别.

This looks like an unpicklable object, based on how it's built. However, it should be able to be pickled inside the module… as it is for me. You seem to have pinpointed the difference between what you are seeing in the object pandas builds on CentOS.


Looking at the pandas codebase, pandas.algos is a pyx file… so that's cython.And here's the code.

_return_false = lambda self, other: False

.py文件中,我知道它将序列化.我不知道dill如何用于cython生成的lambda……(例如lambda cyfunction).

Were that in a .py file, I know it would serialize. I have no idea how dill works for cython generated lambdas… (e.g. a lambda cyfunction).

好像有一个提交( https://github.com/pydata/pandas/commit/73c71dfca10012e25c829930508b5d6f7ccad5ff),其中_return_false被移到类之外到模块范围内.您在CentOS和PC上都看到了吗?可能是因为不同发行版的v0.14.1截断了略有不同的git版本,具体取决于您安装熊猫的方式.

It looks like there was a commit (https://github.com/pydata/pandas/commit/73c71dfca10012e25c829930508b5d6f7ccad5ff) in which _return_false was moved outside a class into the module scope. Do you see that on both CentOS and your PC? It may be that the v0.14.1 for different distros was cut off slightly different git versions… depending on how you installed pandas.

因此,显然,我可以通过尝试获取对象的来源来拾取lambda1 ...对于lambda而言,如果无法获取该对象,则dill将按名称进行抓取...并且显然将其命名为lambda1…即使.pyx文件中没有显示.也许是由于cython如何构建lambda.

So apparently, I can pick up a lambda1 by trying to get the source of the object… which for lambda, if it can't get the source, dill will grab by name… and apparently it's named lambda1… even though that doesn't show up in the .pyx file. Maybe it's due to how cython builds the lambdas.

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas.algos
>>> import dill
>>> dill.source.importable(pandas.algos._return_false)
'from pandas import lambda1\n'


The difference might be coming from cython… since the code is generated from a .pyx in pandas. What's your versions of cython? Mine is 0.20.2.


