MultipleOutputFormat在hadoop中 | MultipleOutputFormat在hadoop中

本文介绍了MultipleOutputFormat在hadoop中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Hadoop的新手。我正在尝试Wordcount计划。

现在尝试多个输出文件，我使用 MultipleOutputFormat 。这个链接帮助我做到了这一点。

在我的驱动程式中我有类

$ $ p $ $ code> MultipleOutputs.addNamedOutput（conf，even，
org.apache.hadoop.mapred.TextOutputFormat .class，Text.class，
IntWritable.class）;

MultipleOutputs.addNamedOutput（conf，odd，
org.apache.hadoop.mapred.TextOutputFormat.class，Text.class，
IntWritable.class）;`

和我的reduce类成了这个

  public static class Reduce extends MapReduceBase implements 
 Reducer< Text，IntWritable，Text，IntWritable> {
 MultipleOutputs mos = null; 
 
 public void configure（JobConf job）{
 mos = new MultipleOutputs（job）; 
} 
 $ b $ public void reduce（Text key，Iterator< IntWritable> values，
 OutputCollector< Text，IntWritable>输出，Reporter记者）
抛出IOException {
 int sum = 0; 
 while（values.hasNext（））{
 sum + = values.next（）。get（）; 
 
 if（sum％2 == 0）{
 mos.getCollector（even，reporter）.collect（key，new IntWritable（sum））; 
} else {
 mos.getCollector（odd，reporter）.collect（key，new IntWritable（sum））; 
} 
 //output.collect（key，new IntWritable（sum））; 
} 
 @Override 
 public void close（）throws IOException {
 // TODO自动生成的方法存根
 mos.close（）;

事情很成功，但我得到很多文件，（一个奇数和一个偶数对于每个map-reduce）

问题是：我怎样才能有2个输出文件（奇数和偶数），以便每个奇数输出map-reduce被写入该奇数文件中，并且相同的偶数。

解决方案

每个reducer使用OutputFormat将记录写入。所以这就是为什么你每个减速器都得到一组奇数和偶数的文件。这是通过设计，使每个reducer可以并行执行写入。

如果您只需要一个奇数和单个偶数文件，则需要设置mapred.reduce .tasks为1.但是性能会受到影响，因为所有的映射器都会被放入一个reducer中。
另外一个选择是更改进程读取这些文件以接受多个输入文件，或者编写将这些文件合并在一起的单独进程。

I'm a newbie in Hadoop. I'm trying out the Wordcount program.
Now to try out multiple output files, i use MultipleOutputFormat. this link helped me in doing it. http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
in my driver class i had
MultipleOutputs.addNamedOutput(conf, "even", org.apache.hadoop.mapred.TextOutputFormat.class, Text.class, IntWritable.class); MultipleOutputs.addNamedOutput(conf, "odd", org.apache.hadoop.mapred.TextOutputFormat.class, Text.class, IntWritable.class);`
and my reduce class became this
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { MultipleOutputs mos = null; public void configure(JobConf job) { mos = new MultipleOutputs(job); } public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } if (sum % 2 == 0) { mos.getCollector("even", reporter).collect(key, new IntWritable(sum)); }else { mos.getCollector("odd", reporter).collect(key, new IntWritable(sum)); } //output.collect(key, new IntWritable(sum)); } @Override public void close() throws IOException { // TODO Auto-generated method stub mos.close(); } }
Things worked , but i get LOT of files, (one odd and one even for every map-reduce)
Question is : How can i have just 2 output files (odd & even) so that every odd output of every map-reduce gets written into that odd file, and same for even.
解决方案
Each reducer uses an OutputFormat to write records to. So that's why you are getting a set of odd and even files per reducer. This is by design so that each reducer can perform writes in parallel.
If you want just a single odd and single even file, you'll need to set mapred.reduce.tasks to 1. But performance will suffer, because all the mappers will be feeding into a single reducer.
Another option is to change the process the reads these files to accept multiple input files, or write a separate process that merges these files together.

这篇关于MultipleOutputFormat在hadoop中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！