问题描述
根据此
网上还有一些有用的信息。有人说这是可拆分的,有人说这不是。
两者都是正确的,但在不同的级别。
Cloudera博客
这意味着如果整个文本文件使用Snappy进行压缩,那么文件不可拆分。但是,如果文件内的每条记录都是用Snappy压缩的,那么文件可能是可拆分的,例如在带有块压缩的序列文件中。
更加清楚,是不一样的:
< START-FILE>
< START-SNAPPY-BLOCK>
全文内容
< END-SNAPPY-BLOCK>
< END-FILE>
比
< START-文件>
< START-SNAPPY-BLOCK1>
RECORD1
< END-SNAPPY-BLOCK1>
< START-SNAPPY-BLOCK2>
RECORD2
< END-SNAPPY-BLOCK2>
< START-SNAPPY-BLOCK3>
RECORD3
< END-SNAPPY-BLOCK3>
< END-FILE>
活动区块不可分割,但活动区块的文件为splittables 。
According to this Cloudera post, Snappy IS splittable.
But from the hadoop definitive guide, Snappy is NOT splittable.
There are also some confilitcting information on the web. Some say it's splittable, some say it's not.
Both are correct but in different levels.
According with Cloudera blog http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/
This means that if a whole text file is compressed with Snappy then the file is NOT splittable. But if each record inside the file is compressed with Snappy then the file could be splittable, for example in Sequence files with block compression.
To be more clear, is not the same:
<START-FILE>
<START-SNAPPY-BLOCK>
FULL CONTENT
<END-SNAPPY-BLOCK>
<END-FILE>
than
<START-FILE>
<START-SNAPPY-BLOCK1>
RECORD1
<END-SNAPPY-BLOCK1>
<START-SNAPPY-BLOCK2>
RECORD2
<END-SNAPPY-BLOCK2>
<START-SNAPPY-BLOCK3>
RECORD3
<END-SNAPPY-BLOCK3>
<END-FILE>
Snappy blocks are NOT splittable but files with snappy blocks are splittables.
这篇关于Snappy是可拆分还是不可拆分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!