MapReduce输出ArrayWritable | MapReduce输出ArrayWritable

本文介绍了MapReduce输出ArrayWritable的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从一个简单的MapReduce任务中获取ArrayWritable的输出。我发现了一些类似问题的问题，但我无法在自己的代码中解决问题。所以我期待着你的帮助。感谢：）！

输入：包含一些句子的文本文件。

输出应为：

 < Word，<长度，文本文件中相同单词的数量>> 
示例：Hello 5 2

我在Job中得到的输出是：

  hello WordLength_V01 $ IntArrayWritable @ 221cf05 
 test WordLength_V01 $ IntArrayWritable @ 799e525a

我认为问题出在IntArrayWritable的子类中，但我没有得到正确的修正来解决这个问题。由我们有Hadoop 2.5。我使用下面的代码来得到这个结果：

主要方法：

<$ p公共静态无效主要（字符串[] args）抛出异常{
配置conf =新配置（）;
Job job = Job.getInstance（conf，word length V1）;

//设置类
job.setJarByClass（WordLength_V01.class）;
job.setMapperClass（MyMapper.class）;
// job.setCombinerClass（MyReducer.class）;
job.setReducerClass（MyReducer.class）;

//设置输出和输入参数
job.setMapOutputKeyClass（Text.class）;
job.setMapOutputValueClass（IntWritable.class）;

job.setOutputKeyClass（Text.class）;
job.setOutputValueClass（IntArrayWritable.class）;

//减少的数量
job.setNumReduceTasks（1）;

//设置FileDestination
FileInputFormat.addInputPath（job，new Path（args [0]））;
FileOutputFormat.setOutputPath（job，new Path（args [1]））;

System.exit（job.waitForCompletion（true）？0：1）;

Mapper：

  public static class MyMapper扩展Mapper< Object，Text，Text，IntWritable> {
 
 //初始化变量
 private final static IntWritable one = new IntWritable（1）; 
私人文字=新文字（）; 
 
 //映射方法
 public void map（Object key，Text value，Context context）throws IOException，InterruptedException {
 
 //使用Tokenizer 
 StringTokenizer itr = new StringTokenizer（value.toString（））; 
 
 //选择每个单词
 while（itr.hasMoreTokens（））{
 word.set（itr.nextToken（））; 
 
 //输出对
 context.write（word，one）; 
 
 
 
 
 
   Reducer： 
  public static class MyReducer extends Reducer< Text，IntWritable，Text，IntArrayWritable> {
 
 //初始化变量
 private IntWritable count = new IntWritable（）; 
 private IntWritable length = new IntWritable（）; 
 
 //减少方法
 public void reduce（Text key，Iterable< IntWritable> values，Context context）throws IOException，InterruptedException {
 
 // Count Words 
 int sum = 0; （IntWritable val：values）
 {
 sum + = val.get（）; 
} 
 
 count.set（sum）; 
 
 //字长
 length.set（key.getLength（））; 
 
 //定义输出
 IntWritable [] temp = new IntWritable [2]; 
 IntArrayWritable输出=新IntArrayWritable（temp）; 
 
 temp [0] = count; 
 temp [1] =长度; 
 
 //输出
 output.set（temp）; 
 context.write（key，new IntArrayWritable（output.get（）））; 
 
 
  
 子类 
  public static class IntArrayWritable extends ArrayWritable {
 public IntArrayWritable（IntWritable [] intWritables）{
 super（IntWritable.class ）; 
} 
 
 @Override 
 public IntWritable [] get（）{
 return（IntWritable []）super.get（）; 
 
 $ b @Override 
 public void write（DataOutput arg0）抛出IOException {
 for（IntWritable data：get（））{
 data.write （为arg0）; 
} 
} 
} 
  
我使用以下链接找到解决方案： 
 
 
 
   
 
   
 
   stackoverflow.com（1） a> 
 
   
 
 
 
 我非常感谢任何想法！
 > 
 
   --------解决方案--------  
 
 
 新的子类：
  public static cl int IntArrayWritable extends ArrayWritable {
 $ b $ public IntArrayWritable（IntWritable [] values）{
 super（IntWritable.class，values）; 
} 
 
 @Override 
 public IntWritable [] get（）{
 return（IntWritable []）super.get（）; 
} 
 
 @Override 
 public String toString（）{
 IntWritable [] values = get（）; 
返回值[0] .toString（）+，+ values [1] .toString（）; 
 
 
  
新Reduce方法：
 
 
  public void reduce（Text key，Iterable< IntWritable> values，
 Context context）throws IOException，InterruptedException {
 
 // Count Words 
 int sum = 0; （IntWritable val：values）
 {
 sum + = val.get（）; 
} 
 
 count.set（sum）; 
 
 //字长
 length.set（key.getLength（））; 
 
 //定义输出
 IntWritable [] temp = new IntWritable [2]; 
 temp [0] = count; 
 temp [1] =长度; 
 
 context.write（key，new IntArrayWritable（temp））; 
 
  
 
 
解决方案
一切看起来都很完美。只需要在子类中编写一个printStrings（）方法，该方法返回一个字符串而不是数组。在构建的toString（）中将返回字符串数组，这是它在输出中给出地址而不是值的原因。
  public String printStrings（）{
 String strings =; 
 for（int i = 0; i< values.length; i ++）{
 strings = strings ++ values [i] .toString（）; 
} 
返回字符串; 
} 
  
 
I'm trying to get an output from an ArrayWritable in a simple MapReduce-Task. I found a few questions with a similar problem, but I can't solve the problem in my own code. So I'm looking forward to your help. Thanks :)!
Input: Textfile with some sentence.
Output should be:
<Word, <length, number of same words in Textfile>>
 Example: Hello  5  2
The output that I get in my Job is:
hello WordLength_V01$IntArrayWritable@221cf05
test WordLength_V01$IntArrayWritable@799e525a
I think the problem is in the subclass from IntArrayWritable, but I don't get the right correction to fix this. By the we have Hadoop 2.5. I use the following code to get this result:
Main Method:
public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word length V1");

    // Set Classes
    job.setJarByClass(WordLength_V01.class);
    job.setMapperClass(MyMapper.class);
    // job.setCombinerClass(MyReducer.class);
    job.setReducerClass(MyReducer.class);

    // Set Output and Input Parameters
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntArrayWritable.class);

    // Number of Reducers
    job.setNumReduceTasks(1);

    // Set FileDestination
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Mapper:
public static class MyMapper extends Mapper<Object, Text, Text, IntWritable> {

    // Initialize Variables
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    // Map Method
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

        // Use Tokenizer
        StringTokenizer itr = new StringTokenizer(value.toString());

        // Select each word
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());

            // Output Pair
            context.write(word, one);
        }
    }
}
Reducer:
public static class MyReducer extends Reducer<Text, IntWritable, Text, IntArrayWritable> {

    // Initialize Variables
    private IntWritable count = new IntWritable();
    private IntWritable length = new IntWritable();

    // Reduce Method
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

        // Count Words
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }

        count.set(sum);

        // Wordlength
        length.set(key.getLength());

        // Define Output
        IntWritable[] temp = new IntWritable[2];
        IntArrayWritable output = new IntArrayWritable(temp);

        temp[0] = count;
        temp[1] = length;

        // Output
        output.set(temp);
        context.write(key, new IntArrayWritable(output.get()));
    }
}
SubClass
public static class IntArrayWritable extends ArrayWritable {
    public IntArrayWritable(IntWritable[] intWritables) {
        super(IntWritable.class);
    }

    @Override
    public IntWritable[] get() {
        return (IntWritable[]) super.get();
    }

    @Override
    public void write(DataOutput arg0) throws IOException {
        for(IntWritable data : get()){
            data.write(arg0);
        }
    }
}
I used the following links to find a solution:Interface Writable (hadoop.apache.org)
Class ArrayWritable (hadoop.apache.org)
stackoverflow.com (1)
stackoverflow.com (2)
I'm really thankful for any idea!
-------- Solution --------
New SubClass:
public static class IntArrayWritable extends ArrayWritable {

    public IntArrayWritable(IntWritable[] values) {
        super(IntWritable.class, values);
    }

    @Override
    public IntWritable[] get() {
        return (IntWritable[]) super.get();
    }

    @Override
    public String toString() {
        IntWritable[] values = get();
        return values[0].toString() + ", " + values[1].toString();
    }
}
New Reduce Method:
public void reduce(Text key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException {

        // Count Words
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }

        count.set(sum);

        // Wordlength
        length.set(key.getLength());

        // Define Output
        IntWritable[] temp = new IntWritable[2];
        temp[0] = count;
        temp[1] = length;

        context.write(key, new IntArrayWritable(temp));
}
 解决方案 
Everything looks perfect. Just you need to write one more method printStrings() in your subclass that returns a string instead of array . In built toString() will return array of strings thats the reason it is giving address in your output instead of values.
public String printStrings() {
     String strings = "";
        for (int i = 0; i < values.length; i++) {
         strings = strings + " "+ values[i].toString();
       }
      return strings;
    }
                        
这篇关于MapReduce输出ArrayWritable的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！