问题描述
据的gz的文件大小被保存在一个.gz的文件的最后4字节的specifiction
According to the specifiction of gz the filesize is saved in the last 4bytes of a .gz file.
我已经创建了2个文件
dd if=/dev/urandom of=500M bs=1024 count=500000
dd if=/dev/urandom of=5G bs=1024 count=5000000
我gziped它们
I gziped them
gzip 500M 5G
我检查了最后4个字节做
I checked the last 4 bytes doing
tail -c4 500M|od -I (returns 512000000 as expected)
tail -c4 5G|od -I (returns 825032704 as not expected)
似乎击中无形的32位屏障,使写进ISIZE完全胡说八道的价值。这是比较烦人,比他们用了一些错误,而不是位。
It seems that hitting the invisible 32bit barrier, makes the value written into the ISIZE completely nonsense. Which is more annoying, than if they had used some error bit instead.
有谁知道的一种方式来获得pssed。广州从.gz的文件大小的uncom $ P $无解压呢?
Does anyone know of a way to get the uncompressed .gz filesize from the .gz without extracting it?
感谢
说明: http://www.gzip.org/zlib/rfc-gzip.html
编辑:如果任何人都可以尝试一下,你可以使用/ dev / zero的,而不是为/ dev / urandom的
edit:if anyone to try it out, you could use /dev/zero instead of /dev/urandom
推荐答案
没有一个。
要得到一个COM pressed流的确切大小是实际去DECOM preSS(即使你写的一切到/ dev / null的,只是算个字节)的唯一方法。
The only way to get the exact size of a compressed stream is to actually go and decompress it (even if you write everything to /dev/null and just count the bytes).
其值得注意的是,ISIZE被定义为
Its worth noting that ISIZE is defined as
ISIZE(输入大小)
这包含原始(uncom pressed)输入的大小
数据模2 ^ 32。
在GZIP RFC 所以它不是真正的破的在32位的障碍,你看到的是预期的行为。
in the gzip RFC so it isn't actually breaking at the 32-bit barrier, what you're seeing is expected behavior.
这篇关于得到非常大的。广州文件的文件大小在64位平台的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!