


For some reason I cannot get cPickle.load to work on the file-type object returned by ZipFile.open().If I call read() on the file-type object returned by ZipFile.open() I can use cPickle.loads though.


import zipfile
import cPickle

# the data we want to store
some_data = {1: 'one', 2: 'two', 3: 'three'}

# create a zipped pickle file
zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
zf.writestr('data.pkl', cPickle.dumps(some_data))

# cPickle.loads works
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd1 = cPickle.loads(zf.open('data.pkl').read())

# cPickle.load doesn't work
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd2 = cPickle.load(zf.open('data.pkl'))


Note: I don't want to zip just the pickle file but many files of other types. This is just an example.


这是由于zipfile模块实现的伪文件对象中的缺陷(对于Python 2.6中引入的ZipFile类的.open方法而言) ).考虑:

It's due to an imperfection in the pseudofile object implemented by the zipfile module (for the .open method of the ZipFile class introduced in Python 2.6). Consider:

>>> f = zf.open('data.pkl')
>>> f.read(1)
>>> f.readline()
>>> f.read(1)

.read(1)-.readline()的序列是.loads在内部执行的操作(在协议0的pickle上,这是Python 2中的默认值,这是您在此处使用的).不幸的是,zipfile的不完善之处意味着该特定序列不起作用,在第一个读取/读取行对之后立即产生了虚假的文件结尾"(.read返回空字符串).

the sequence of .read(1) -- .readline() is what .loads internally does (on a protocol-0 pickle, the default in Python 2, which is what you're using here). Unfortunately zipfile's imperfection means this particular sequence doesn't work, producing a spurious "end of file" (.read returning an empty string) right after the first read/readline pair.

不确定Python的标准库中的此错误是否在Python 2.7中已得到修复-我将进行检查.

Not sure offhand if this bug in Python's standard library is fixed in Python 2.7 -- I'm going to check.

编辑:刚刚选中-该错误已在Python 2.7 rc1(当前为最新2.7版本的发行候选版本)中修复.我还不知道它是否在2.6的最新错误修复版本中也已修复.

Edit: just checked -- the bug is fixed in Python 2.7 rc1 (the release candidate that's currently the latest 2.7 version). I don't yet know whether it's fixed in the latest bug-fix release of 2.6 as well.

再次编辑:该错误仍在Python 2.6.5(Python 2.6的最新错误修复版本)中-因此,如果您无法升级到2.7并且需要性能更好的伪文件来自ZipFile.open的对象,2.7修复程序的向后移植似乎是唯一可行的解​​决方案.

Edit again: the bug is still there in Python 2.6.5, the latest bug-fix release of Python 2.6 -- so if you can't upgrade to 2.7 and need better-behaving pseudofile objects from ZipFile.open, a backport of the 2.7 fix seems the only viable solution.


Note that it's not certain you do need better-behaving pseudofile objects; if you control the dump calls and can use the latest-and-greatest protocol, everything will be fine:

>>> zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
>>> zf.writestr('data.pkl', cPickle.dumps(some_data, -1))
>>> sd2 = cPickle.load(zf.open('data.pkl'))


it's only old crufty backwards-compatible "protocol 0" (the default) that requires proper pseudofile object behavior when mixing read and readline calls in the load (protocol 0 is also slower, and results in larger pickles, so it's definitely not recommended unless backwards compatibility with old Python versions, or the ascii-only nature of the pickles that 0 produces, are mandatory constraints in your application).


08-20 12:02