问题描述
由于某种原因,我无法使cPickle.load在ZipFile.open()返回的文件类型对象上工作.如果我对ZipFile.open()返回的文件类型对象调用read(),则可以使用cPickle.loads.
For some reason I cannot get cPickle.load to work on the file-type object returned by ZipFile.open().If I call read() on the file-type object returned by ZipFile.open() I can use cPickle.loads though.
示例....
import zipfile
import cPickle
# the data we want to store
some_data = {1: 'one', 2: 'two', 3: 'three'}
#
# create a zipped pickle file
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
zf.writestr('data.pkl', cPickle.dumps(some_data))
zf.close()
#
# cPickle.loads works
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd1 = cPickle.loads(zf.open('data.pkl').read())
zf.close()
#
# cPickle.load doesn't work
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd2 = cPickle.load(zf.open('data.pkl'))
zf.close()
注意:我不希望仅压缩pickle文件,而要压缩许多其他类型的文件.这只是一个例子.
Note: I don't want to zip just the pickle file but many files of other types. This is just an example.
推荐答案
这是由于zipfile
模块实现的伪文件对象中的缺陷(对于Python 2.6中引入的ZipFile
类的.open
方法而言) ).考虑:
It's due to an imperfection in the pseudofile object implemented by the zipfile
module (for the .open
method of the ZipFile
class introduced in Python 2.6). Consider:
>>> f = zf.open('data.pkl')
>>> f.read(1)
'('
>>> f.readline()
'dp1\n'
>>> f.read(1)
''
>>>
.read(1)
-.readline()
的序列是.loads
在内部执行的操作(在协议0的pickle上,这是Python 2中的默认值,这是您在此处使用的).不幸的是,zipfile
的不完善之处意味着该特定序列不起作用,在第一个读取/读取行对之后立即产生了虚假的文件结尾"(.read返回空字符串).
the sequence of .read(1)
-- .readline()
is what .loads
internally does (on a protocol-0 pickle, the default in Python 2, which is what you're using here). Unfortunately zipfile
's imperfection means this particular sequence doesn't work, producing a spurious "end of file" (.read returning an empty string) right after the first read/readline pair.
不确定Python的标准库中的此错误是否在Python 2.7中已得到修复-我将进行检查.
Not sure offhand if this bug in Python's standard library is fixed in Python 2.7 -- I'm going to check.
编辑:刚刚选中-该错误已在Python 2.7 rc1(当前为最新2.7版本的发行候选版本)中修复.我还不知道它是否在2.6的最新错误修复版本中也已修复.
Edit: just checked -- the bug is fixed in Python 2.7 rc1 (the release candidate that's currently the latest 2.7 version). I don't yet know whether it's fixed in the latest bug-fix release of 2.6 as well.
再次编辑:该错误仍在Python 2.6.5(Python 2.6的最新错误修复版本)中-因此,如果您无法升级到2.7并且需要性能更好的伪文件来自ZipFile.open
的对象,2.7修复程序的向后移植似乎是唯一可行的解决方案.
Edit again: the bug is still there in Python 2.6.5, the latest bug-fix release of Python 2.6 -- so if you can't upgrade to 2.7 and need better-behaving pseudofile objects from ZipFile.open
, a backport of the 2.7 fix seems the only viable solution.
请注意,不确定要做是否需要性能更好的伪文件对象.如果您控制转储调用并可以使用最新和最新的协议,那么一切都会很好:
Note that it's not certain you do need better-behaving pseudofile objects; if you control the dump calls and can use the latest-and-greatest protocol, everything will be fine:
>>> zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
>>> zf.writestr('data.pkl', cPickle.dumps(some_data, -1))
>>> sd2 = cPickle.load(zf.open('data.pkl'))
>>>
它是仅旧的,向后兼容的老旧协议0"(默认值),当在load
中混合读取和readline调用时,要求正确的伪文件对象行为(协议0的速度也较慢,并且会导致泡菜变大,因此绝对不建议使用,除非与旧的Python版本向后兼容,或者0产生的酱菜的纯ascii性质是应用程序中的强制性约束.
it's only old crufty backwards-compatible "protocol 0" (the default) that requires proper pseudofile object behavior when mixing read and readline calls in the load
(protocol 0 is also slower, and results in larger pickles, so it's definitely not recommended unless backwards compatibility with old Python versions, or the ascii-only nature of the pickles that 0 produces, are mandatory constraints in your application).
这篇关于从zipfile加载一个pickle文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!