将数据放入地图中并以正确的类型归约函数

将数据放入地图中并以正确的类型归约函数

本文介绍了如何(在Hadoop中)将数据放入地图中并以正确的类型归约函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Hadoop中我很难理解,如何将数据放入地图并简化功能.我知道我们可以定义输入格式和输出格式,然后定义输入和输出的键类型.但是举个例子,如果我们希望对象成为输入类型,那么Hadoop在内部如何做到这一点?

I'm having a bit difficult in understanding in Hadoop, how the data put into the map and reduced functions. I know that we can define the input format and output format and then the key types for input and output. But for an example if we want an object to be the input type, how does Hadoop internally does that ?

感谢...

推荐答案

您可以使用Hadoop InputFormat和OutputFormat接口创建自定义格式.例如,可以将MapReduce作业的输出格式化为JSON.这个-

you can use Hadoop InputFormat and OutputFormat interfaces to create your custom formats..an example could be to format the output of your MapReduce job as JSON..something like this -

public class JsonOutputFormat extends TextOutputFormat<Text, IntWritable> {
    @Override
    public RecordWriter<Text, IntWritable> getRecordWriter(
            TaskAttemptContext context) throws IOException,
                  InterruptedException {
        Configuration conf = context.getConfiguration();
        Path path = getOutputPath(context);
        FileSystem fs = path.getFileSystem(conf);
        FSDataOutputStream out =
                fs.create(new Path(path,context.getJobName()));
        return new JsonRecordWriter(out);
    }

    private static class JsonRecordWriter extends
          LineRecordWriter<Text,IntWritable>{
        boolean firstRecord = true;
        @Override
        public synchronized void close(TaskAttemptContext context)
                throws IOException {
            out.writeChar('{');
            super.close(null);
        }

        @Override
        public synchronized void write(Text key, IntWritable value)
                throws IOException {
            if (!firstRecord){
                out.writeChars(",\r\n");
                firstRecord = false;
            }
            out.writeChars("\"" + key.toString() + "\":\""+
                    value.toString()+"\"");
        }

        public JsonRecordWriter(DataOutputStream out)
                throws IOException{
            super(out);
            out.writeChar('}');
        }
    }
}

这篇关于如何(在Hadoop中)将数据放入地图中并以正确的类型归约函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 03:17