从zipfile加载一个pickle文件

本文介绍了从zipfile加载一个pickle文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

由于某种原因，我无法使cPickle.load在ZipFile.open()返回的文件类型对象上工作.如果我对ZipFile.open()返回的文件类型对象调用read()，则可以使用cPickle.loads.

For some reason I cannot get cPickle.load to work on the file-type object returned by ZipFile.open().If I call read() on the file-type object returned by ZipFile.open() I can use cPickle.loads though.

示例....

import zipfile
import cPickle

# the data we want to store
some_data = {1: 'one', 2: 'two', 3: 'three'}

#
# create a zipped pickle file
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
zf.writestr('data.pkl', cPickle.dumps(some_data))
zf.close()

#
# cPickle.loads works
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd1 = cPickle.loads(zf.open('data.pkl').read())
zf.close()

#
# cPickle.load doesn't work
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd2 = cPickle.load(zf.open('data.pkl'))
zf.close()

注意:我不希望仅压缩pickle文件，而要压缩许多其他类型的文件.这只是一个例子.

Note: I don't want to zip just the pickle file but many files of other types. This is just an example.

推荐答案

这是由于zipfile模块实现的伪文件对象中的缺陷(对于Python 2.6中引入的ZipFile类的.open方法而言) ).考虑:

It's due to an imperfection in the pseudofile object implemented by the zipfile module (for the .open method of the ZipFile class introduced in Python 2.6). Consider:

>>> f = zf.open('data.pkl')
>>> f.read(1)
'('
>>> f.readline()
'dp1\n'
>>> f.read(1)
''
>>>

.read(1)-.readline()的序列是.loads在内部执行的操作(在协议0的pickle上，这是Python 2中的默认值，这是您在此处使用的).不幸的是，zipfile的不完善之处意味着该特定序列不起作用，在第一个读取/读取行对之后立即产生了虚假的文件结尾"(.read返回空字符串).

the sequence of .read(1) -- .readline() is what .loads internally does (on a protocol-0 pickle, the default in Python 2, which is what you're using here). Unfortunately zipfile's imperfection means this particular sequence doesn't work, producing a spurious "end of file" (.read returning an empty string) right after the first read/readline pair.

不确定Python的标准库中的此错误是否在Python 2.7中已得到修复-我将进行检查.

Not sure offhand if this bug in Python's standard library is fixed in Python 2.7 -- I'm going to check.

编辑:刚刚选中-该错误已在Python 2.7 rc1(当前为最新2.7版本的发行候选版本)中修复.我还不知道它是否在2.6的最新错误修复版本中也已修复.

Edit: just checked -- the bug is fixed in Python 2.7 rc1 (the release candidate that's currently the latest 2.7 version). I don't yet know whether it's fixed in the latest bug-fix release of 2.6 as well.

再次编辑:该错误仍在Python 2.6.5(Python 2.6的最新错误修复版本)中-因此，如果您无法升级到2.7并且需要性能更好的伪文件来自ZipFile.open的对象，2.7修复程序的向后移植似乎是唯一可行的解决方案.

Edit again: the bug is still there in Python 2.6.5, the latest bug-fix release of Python 2.6 -- so if you can't upgrade to 2.7 and need better-behaving pseudofile objects from ZipFile.open, a backport of the 2.7 fix seems the only viable solution.

请注意，不确定要做是否需要性能更好的伪文件对象.如果您控制转储调用并可以使用最新和最新的协议，那么一切都会很好:

Note that it's not certain you do need better-behaving pseudofile objects; if you control the dump calls and can use the latest-and-greatest protocol, everything will be fine:

>>> zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
>>> zf.writestr('data.pkl', cPickle.dumps(some_data, -1))
>>> sd2 = cPickle.load(zf.open('data.pkl'))
>>>

它是仅旧的，向后兼容的老旧协议0"(默认值)，当在load中混合读取和readline调用时，要求正确的伪文件对象行为(协议0的速度也较慢，并且会导致泡菜变大，因此绝对不建议使用，除非与旧的Python版本向后兼容，或者0产生的酱菜的纯ascii性质是应用程序中的强制性约束.

it's only old crufty backwards-compatible "protocol 0" (the default) that requires proper pseudofile object behavior when mixing read and readline calls in the load (protocol 0 is also slower, and results in larger pickles, so it's definitely not recommended unless backwards compatibility with old Python versions, or the ascii-only nature of the pickles that 0 produces, are mandatory constraints in your application).

这篇关于从zipfile加载一个pickle文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！