问题描述
我有一个map-reduce java程序,其中我尝试只压缩映射器输出,但没有reducer输出。我认为这可以通过在配置实例中设置以下属性如下所列。但是,当我运行我的工作,reducer生成的输出仍然是压缩,因为生成的文件是:part-r-00000.gz。有没有人成功地只压缩映射程序数据,但不是reducer?这是可能吗?
I have a map-reduce java program in which I try to only compress the mapper output but not the reducer output. I thought that this would be possible by setting the following properties in the Configuration instance as listed below. However, when I run my job, the generated output by the reducer still is compressed since the file generated is: part-r-00000.gz. Has anyone successfully just compressed the mapper data but not the reducer? Is that even possible?
//压缩映射程序输出
//Compress mapper output
conf.setBoolean("mapred.output.compress", true);
conf.set("mapred.output.compression.type", CompressionType.BLOCK.toString());
conf.setClass("mapred.output.compression.codec", GzipCodec.class, CompressionCodec.class);
推荐答案
With MR2, now we should set
conf.set("mapreduce.map.output.compress", true)
conf.set("mapreduce.output.fileoutputformat.compress", false)
有关详细信息,请参考:
For more details, refer: http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
这篇关于Hadoop,如何压缩映射器输出,但不是reducer输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!