如何压缩映射器输出

如何压缩映射器输出

本文介绍了Hadoop,如何压缩映射器输出,但不是reducer输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个map-reduce java程序,其中我尝试只压缩映射器输出,但没有reducer输出。我认为这可以通过在配置实例中设置以下属性如下​​所列。但是,当我运行我的工作,reducer生成的输出仍然是压缩,因为生成的文件是:part-r-00000.gz。有没有人成功地只压缩映射程序数据,但不是reducer?这是可能吗?

I have a map-reduce java program in which I try to only compress the mapper output but not the reducer output. I thought that this would be possible by setting the following properties in the Configuration instance as listed below. However, when I run my job, the generated output by the reducer still is compressed since the file generated is: part-r-00000.gz. Has anyone successfully just compressed the mapper data but not the reducer? Is that even possible?

//压缩映射程序输出

//Compress mapper output

conf.setBoolean("mapred.output.compress", true);
conf.set("mapred.output.compression.type", CompressionType.BLOCK.toString());
conf.setClass("mapred.output.compression.codec", GzipCodec.class, CompressionCodec.class);


推荐答案



With MR2, now we should set

conf.set("mapreduce.map.output.compress", true)
conf.set("mapreduce.output.fileoutputformat.compress", false)

有关详细信息,请参考:

For more details, refer: http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

这篇关于Hadoop,如何压缩映射器输出,但不是reducer输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 03:16