本文介绍了Bzip2 区块头:1AY&SY的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是关于 bzip2 归档格式的问题.任何 Bzip2 存档都由文件头、一个或多个块和尾结构组成.所有块都应以1AY&SY"开头,Pi 编号的 6 字节 BCD 编码数字,0x314159265359.根据 bzip2的来源:

This is the question about bzip2 archive format. Any Bzip2 archive consists of file header, one or more blocks and tail structure. All blocks should start with "1AY&SY", 6 bytes of BCD-encoded digits of the Pi number, 0x314159265359. According to the source of bzip2:

/*--
  A 6-byte block header, the value chosen arbitrarily
  as 0x314159265359 :-).  A 32 bit value does not really
  give a strong enough guarantee that the value will not
  appear by chance in the compressed datastream.  Worst-case
  probability of this event, for a 900k block, is about
  2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
  For a compressed file of size 100Gb -- about 100000 blocks --
  only a 48-bit marker will do.  NB: normal compression/
  decompression do *not* rely on these statistical properties.
  They are only important when trying to recover blocks from
  damaged files.
--*/

问题是:是不是所有的 bzip2 存档都会有块的开头与字节边界对齐?我的意思是所有由 bzip2 的参考实现创建的档案,bzip2-1.0.5+ 实用程序.

The question is: Is it true, that all bzip2 archives will have blocks with start aligned to byte boundary? I mean all archives created by reference implementation of bzip2, the bzip2-1.0.5+ utility.

我认为 bzip2 可能不会将流解析为字节流,而是将其解析为位流(块本身是由 huffman 编码的,它不是按设计进行字节对齐的).

I think that bzip2 may parse the stream not as byte stream but as bit stream (the block itself is encoded by huffman, which is not byte-aligned by design).

那么,换句话说:如果 grep -c 1AY&SY 更大(霍夫曼可能会在块内生成 1AY&SY)或等于文件中 bzip2 块的数量?

So, in other words: If grep -c 1AY&SY greater (huffman may generate 1AY&SY inside block) or equal to count of bzip2 blocks in the file?

推荐答案

BZIP2 查看比特流.

BZIP2 looks at a bit stream.

来自 http://blastedbio.blogspot.com/2011/11/random-access-to-bzip2.html:

无论如何,重要的是 BZIP2 文件包含一个或多个流",字节对齐,每个包含一个(零?)或多个块",不是字节对齐的,后面是流的结尾标记(作为 pi 的平方根的六个字节 0x177245385090二进制编码的十进制 (BCD)、四字节校验和和空位字节对齐).

bzip2 维基百科 文章也提到了位块对齐(参见文件格式部分),这似乎与我在学校记忆中的内联(必须实现算法......).

The bzip2 wikipedia article also alludes to bit-block alignment (see the File Format section), which seems to be inline from what I remember from school (had to implement the algorithm...).

这篇关于Bzip2 区块头:1AY&SY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-19 01:28