问题描述
我正在实现一个需要序列化和反序列化大对象的程序,所以我用 pickle
、cPickle
和 marshal
做了一些测试> 模块选择最佳模块.一路上我发现了一些非常有趣的事情:
I'm implementing a program that needs to serialize and deserialize large objects, so I was making some tests with pickle
, cPickle
and marshal
modules to choose the best module. Along the way I found something very interesting:
我正在使用 dumps
然后 loads
(对于每个模块)在字典、元组、整数、浮点数和字符串列表上.
I'm using dumps
and then loads
(for each module) on a list of dicts, tuples, ints, float and strings.
这是我的基准测试的输出:
This is the output of my benchmark:
DUMPING a list of length 7340032
----------------------------------------------------------------------
pickle => 14.675 seconds
length of pickle serialized string: 31457430
cPickle => 2.619 seconds
length of cPickle serialized string: 31457457
marshal => 0.991 seconds
length of marshal serialized string: 117440540
LOADING a list of length: 7340032
----------------------------------------------------------------------
pickle => 13.768 seconds
(same length?) 7340032 == 7340032
cPickle => 2.038 seconds
(same length?) 7340032 == 7340032
marshal => 6.378 seconds
(same length?) 7340032 == 7340032
因此,从这些结果中我们可以看到 marshal
在基准测试的转储部分非常快:
So, from these results we can see that marshal
was extremely fast in the dumping part of the benchmark:
比 pickle
快 14.8 倍,比 cPickle
快 2.6 倍.
但是,令我大吃一惊的是,marshal
在加载部分比 cPickle
慢得多:
But, for my big surprise, marshal
was by far slower than cPickle
in the loading part:
比 pickle
快 2.2 倍,但比 cPickle
慢 3.1 倍.
至于 RAM,marshal
性能同时加载也非常低效:
And as for RAM, marshal
performance while loading was also very inefficient:
我猜测使用 marshal
加载如此缓慢的原因与它的序列化字符串的长度有关(比 pickle
和 长得多)cPickle
).
I'm guessing the reason why loading with marshal
is so slow is somehow related with the length of the its serialized string (much longer than pickle
and cPickle
).
- 为什么
marshal
转储速度更快,加载速度更慢? - 为什么
marshal
序列化的字符串这么长? - 为什么
marshal
的加载在 RAM 中如此低效? - 有没有办法提高
marshal
的加载性能? - 有没有办法将
marshal
快速转储与cPickle
快速加载合并?
- Why
marshal
dumps faster and loads slower? - Why
marshal
serialized string is so long? - Why
marshal
's loading is so inefficient in RAM? - Is there a way to improve
marshal
's loading performance? - Is there a way to merge
marshal
fast dumping withcPickle
fast loading?
推荐答案
cPickle
拥有比 marshal
更智能的算法,并且能够做一些技巧来减少大型物体使用的空间.这意味着解码速度会更慢,但由于结果输出较小,因此编码速度会更快.marshal
很简单,直接按原样序列化对象,而无需对其进行任何进一步分析.这也解释了为什么 marshal
加载如此低效,它只需要做更多的工作 - 例如从磁盘读取更多数据 - 能够做与 cPickle
.
cPickle
has a smarter algorithm than marshal
and is able to do tricks to reduce the space used by large objects. That means it'll be slower to decode but faster to encode as the resulting output is smaller.marshal
is simplistic and serializes the object straight as-is without doing any further analyze it. That also answers why the marshal
loading is so inefficient, it simply has to do more work - as in reading more data from disk - to be able to do the same thing as cPickle
.
marshal
和 cPickle
最终真的是不同的东西,你不能真正同时获得快速保存和快速加载,因为快速保存意味着分析数据结构更少意味着将大量数据保存到磁盘.
marshal
and cPickle
are really different things in the end, you can't really get both fast saving and fast loading since fast saving implies analyzing the data structures less which implies saving a lot of data to disk.
关于 marshal
可能与其他版本的 Python 不兼容的事实,你通常应该使用 cPickle
:
Regarding the fact that marshal
might be incompatible to other versions of Python, you should generally use cPickle
:
"这不是一个通用的持久化"模块.对于Python对象通过RPC调用的通用持久化和传输,请参阅模块pickle和shelve.marshal模块的存在主要是为了支持读取和编写伪编译"代码对于 .pyc 文件的 Python 模块.因此,如果需要,Python 维护者保留以向后不兼容的方式修改 marshal 格式的权利.如果您正在序列化和反序列化 Python 对象,请改用 pickle 模块 – 性能具有可比性,版本独立性得到保证,pickle 支持的对象范围比 marshal 大得多."(关于元帅的python文档)
这篇关于marshal 转储更快,cPickle 加载更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!