本文介绍了得到非常大的。广州文件的文件大小在64位平台的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据的gz的文件大小被保存在一个.gz的文件的最后4字节的specifiction

According to the specifiction of gz the filesize is saved in the last 4bytes of a .gz file.

我已经创建了2个文件

dd if=/dev/urandom of=500M bs=1024 count=500000
dd if=/dev/urandom of=5G bs=1024 count=5000000

我gziped它们

I gziped them

gzip 500M 5G

我检查了最后4个字节做

I checked the last 4 bytes doing

tail -c4 500M|od -I      (returns 512000000 as expected)
tail -c4 5G|od -I        (returns 825032704 as not expected)

似乎击中无形的32位屏障,使写进ISIZE完全胡说八道的价值。这是比较烦人,比他们用了一些错误,而不是位。

It seems that hitting the invisible 32bit barrier, makes the value written into the ISIZE completely nonsense. Which is more annoying, than if they had used some error bit instead.

有谁知道的一种方式来获得pssed。广州从.gz的文件大小的uncom $ P $无解压呢?

Does anyone know of a way to get the uncompressed .gz filesize from the .gz without extracting it?

感谢

说明: http://www.gzip.org/zlib/rfc-gzip.html

编辑:如果任何人都可以尝试一下,你可以使用/ dev / zero的,而不是为/ dev / urandom的

edit:if anyone to try it out, you could use /dev/zero instead of /dev/urandom

推荐答案

没有一个。

要得到一个COM pressed流的确切大小是实际去DECOM preSS(即使你写的一切到/ dev / null的,只是算个字节)的唯一方法。

The only way to get the exact size of a compressed stream is to actually go and decompress it (even if you write everything to /dev/null and just count the bytes).

其值得注意的是,ISIZE被定义为

Its worth noting that ISIZE is defined as

ISIZE(输入大小)
              这包含原始(uncom pressed)输入的大小
              数据模2 ^ 32。

在GZIP RFC 所以它不是真正的的在32位的障碍,你看到的是预期的行为。

in the gzip RFC so it isn't actually breaking at the 32-bit barrier, what you're seeing is expected behavior.

这篇关于得到非常大的。广州文件的文件大小在64位平台的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 20:50