Bzip2区块标头：1AY& SY

本文介绍了Bzip2区块标头：1AY& SY的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是有关bzip2 的问题。任何Bzip2归档文件都由文件头，一个或多个块和尾部结构组成。所有块均应以Pi编号为0x314159265359的6个字节的BCD编码数字 1AY& SY开头。根据：

This is the question about bzip2 archive format. Any Bzip2 archive consists of file header, one or more blocks and tail structure. All blocks should start with "1AY&SY", 6 bytes of BCD-encoded digits of the Pi number, 0x314159265359. According to the source of bzip2:

/*--
  A 6-byte block header, the value chosen arbitrarily
  as 0x314159265359 :-).  A 32 bit value does not really
  give a strong enough guarantee that the value will not
  appear by chance in the compressed datastream.  Worst-case
  probability of this event, for a 900k block, is about
  2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
  For a compressed file of size 100Gb -- about 100000 blocks --
  only a 48-bit marker will do.  NB: normal compression/
  decompression do *not* rely on these statistical properties.
  They are only important when trying to recover blocks from
  damaged files.
--*/

问题是：确实如此，所有bzip2存档都会有与字节边界开始对齐的块？我的意思是所有由bzip2的引用实现（bzip2-1.0.5 +实用程序）创建的所有档案。

The question is: Is it true, that all bzip2 archives will have blocks with start aligned to byte boundary? I mean all archives created by reference implementation of bzip2, the bzip2-1.0.5+ utility.

我认为bzip2可能不是将字节流解析为字节流，而是将位解析为位流（该块本身由霍夫曼编码，在设计上未按字节对齐）。

I think that bzip2 may parse the stream not as byte stream but as bit stream (the block itself is encoded by huffman, which is not byte-aligned by design).

因此，换句话说：如果 grep -c 1AY& SY 更大（霍夫曼内部可能会产生1AY& SY

So, in other words: If grep -c 1AY&SY greater (huffman may generate 1AY&SY inside block) or equal to count of bzip2 blocks in the file?

Bzip2

Bzip2区块标头：1AY&amp; SY

问题描述

推荐答案

Bzip2区块标头：1AY& SY