本文介绍了仅解压缩特定的bzip2块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个bzip2文件(超过5GB),并且我只想解压缩#x块,因为这里有我的数据(块每次都不同)。我该怎么做?



我考虑过要对所有块的位置进行索引,然后从文件中剪切所需的块,并对其应用bzip2recover。 / p>

我还考虑一次压缩1MB,然后将其附加到文件中(并记录位置),并在需要时简单地抓取文件,但是宁愿保持原始的bzip2文件完整。



我首选的语言是Ruby,但是任何语言的解决方案对我来说都是可以的(只要我了解原理)。 / p>

解决方案

有一个



获取源代码,对其进行编译。



运行

 。/seek-bzip2 32< bzip_compressed.bz2 

进行测试。



唯一的参数是想知道的块头的位位移。 这是不正确的。块开始可能未与字节边界对齐,因此应搜索 31 41 59 26 53 59十六进制字符串的每个可能的位移,如bzip2recover-



32是 BZh1标头的位大小,其中1可以是从 1到 9的任何数字(在经典bzip2中)-它是(未压缩的)块大小,以数百kb为单位(不准确)。


Say I have a bzip2 file (over 5GB), and I want to decompress only block #x, because there is where my data is (block is different every time). How would I do this?

I thought about making an index of where all the blocks are, then cut the block I need from the file and apply bzip2recover to it.

I also thought about compressing say 1MB at a time, then appending this to a file (and recording the location), and simply grabbing the file when I need it, but I'd rather keep the original bzip2 file intact.

My preferred language is Ruby, but any language's solution is fine by me (as long as I understand the principle).

解决方案

There is a http://bitbucket.org/james_taylor/seek-bzip2

Grab the source, compile it.

Run with

./seek-bzip2  32 < bzip_compressed.bz2 

to test.

the only param is bit displacement of wondered block header. THIS WAS INCORRECT. Block start may be not aligned to byte boundary, so you should search for every possible bit shifts of "31 41 59 26 53 59" hex string, as it is done in bzip2recover - http://www.bzip.org/1.0.3/html/recovering.html

32 is bit size of "BZh1" header where 1 can be any digit from "1" to "9" (in classic bzip2) - it is a (uncompressed) block size in hundreds of kb (not exact).

这篇关于仅解压缩特定的bzip2块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-24 23:13