问题描述
如果我有几个带有压缩 zlib 数据的二进制字符串,有没有办法有效将它们组合成一个压缩字符串而不解压缩所有内容?
If I have several binary strings with compressed zlib data, is there a way to efficiently combine them into a single compressed string without decompressing everything?
我现在必须做的示例:
c1 = zlib.compress("The quick brown fox jumped over the lazy dog. ")
c2 = zlib.compress("We ride at dawn! ")
c = zlib.compress(zlib.decompress(c1)+zlib.decompress(c2)) # Warning: Inefficient!
d1 = zlib.decompress(c1)
d2 = zlib.decompress(c2)
d = zlib.decompress(c)
assert d1+d2 == d # This will pass!
我想要的例子:
c1 = zlib.compress("The quick brown fox jumped over the lazy dog. ")
c2 = zlib.compress("We ride at dawn! ")
c = magic_zlib_add(c1+c2) # Magical method of combining compressed streams
d1 = zlib.decompress(c1)
d2 = zlib.decompress(c2)
d = zlib.decompress(c)
assert d1+d2 == d # This should pass!
我对 zlib 和 DEFLATE 算法不太了解,所以从理论的角度来看这可能是完全不可能的.另外,我必须使用 use zlib;所以我不能包装 zlib 并想出我自己的协议来透明地处理连接流.
I don't know too much about zlib and the DEFLATE algorithm, so this may be entirely impossible from a theoretical point of view. Also, I must use use zlib; so I can't wrap zlib and come up with my own protocol that transparently handles concatenated streams.
注意:如果解决方案在 Python 中不是微不足道的,我并不介意.我愿意写一些 C 代码并在 Python 中使用 ctypes.
NOTE: I don't really mind if the solution is not trivial in Python. I'm willing to write some C code and use ctypes in Python.
推荐答案
既然你不介意冒险使用 C,你可以从查看 gzjoin.
Since you don't mind venturing into C, you can start by looking at the code for gzjoin.
注意,gzjoin 代码必须解压缩才能找到合并时必须更改的部分,但不必重新压缩.这还不错,因为解压通常比压缩快.
Note, the gzjoin code has to decompress to find the parts that have to change when merged, but it doesn't have to recompress. That's not too bad because decompression is typically faster than compression.
这篇关于高效地将多个 zlib 压缩数据流连接成一个流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!