问题描述
我试图从一个简单的MapReduce任务中获取ArrayWritable的输出。我发现了一些类似问题的问题,但我无法在自己的代码中解决问题。所以我期待着你的帮助。感谢:)!
输入:包含一些句子的文本文件。
输出应为:
< Word,<长度,文本文件中相同单词的数量>>
示例:Hello 5 2
我在Job中得到的输出是:
hello WordLength_V01 $ IntArrayWritable @ 221cf05
test WordLength_V01 $ IntArrayWritable @ 799e525a
我认为问题出在IntArrayWritable的子类中,但我没有得到正确的修正来解决这个问题。由我们有Hadoop 2.5。我使用下面的代码来得到这个结果:
主要方法:
<$ p公共静态无效主要(字符串[] args)抛出异常{
配置conf =新配置();
Job job = Job.getInstance(conf,word length V1);
//设置类
job.setJarByClass(WordLength_V01.class);
job.setMapperClass(MyMapper.class);
// job.setCombinerClass(MyReducer.class);
job.setReducerClass(MyReducer.class);
//设置输出和输入参数
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntArrayWritable.class);
//减少的数量
job.setNumReduceTasks(1);
//设置FileDestination
FileInputFormat.addInputPath(job,new Path(args [0]));
FileOutputFormat.setOutputPath(job,new Path(args [1]));
System.exit(job.waitForCompletion(true)?0:1);
Mapper:
public static class MyMapper扩展Mapper< Object,Text,Text,IntWritable> {
//初始化变量
private final static IntWritable one = new IntWritable(1);
私人文字=新文字();
//映射方法
public void map(Object key,Text value,Context context)throws IOException,InterruptedException {
//使用Tokenizer
StringTokenizer itr = new StringTokenizer(value.toString());
//选择每个单词
while(itr.hasMoreTokens()){
word.set(itr.nextToken());
//输出对
context.write(word,one);
Reducer:
public static class MyReducer extends Reducer< Text,IntWritable,Text,IntArrayWritable> {
//初始化变量
private IntWritable count = new IntWritable();
private IntWritable length = new IntWritable();
//减少方法
public void reduce(Text key,Iterable< IntWritable> values,Context context)throws IOException,InterruptedException {
// Count Words
int sum = 0; (IntWritable val:values)
{
sum + = val.get();
}
count.set(sum);
//字长
length.set(key.getLength());
//定义输出
IntWritable [] temp = new IntWritable [2];
IntArrayWritable输出=新IntArrayWritable(temp);
temp [0] = count;
temp [1] =长度;
//输出
output.set(temp);
context.write(key,new IntArrayWritable(output.get()));
子类
public static class IntArrayWritable extends ArrayWritable {
public IntArrayWritable(IntWritable [] intWritables){
super(IntWritable.class );
}
@Override
public IntWritable [] get(){
return(IntWritable [])super.get();
$ b @Override
public void write(DataOutput arg0)抛出IOException {
for(IntWritable data:get()){
data.write (为arg0);
}
}
}
我使用以下链接找到解决方案:
我非常感谢任何想法!
>
--------解决方案--------
新的子类:
public static cl int IntArrayWritable extends ArrayWritable {
$ b $ public IntArrayWritable(IntWritable [] values){
super(IntWritable.class,values);
}
@Override
public IntWritable [] get(){
return(IntWritable [])super.get();
}
@Override
public String toString(){
IntWritable [] values = get();
返回值[0] .toString()+,+ values [1] .toString();
新Reduce方法:
public void reduce(Text key,Iterable< IntWritable> values,
Context context)throws IOException,InterruptedException {
// Count Words
int sum = 0; (IntWritable val:values)
{
sum + = val.get();
}
count.set(sum);
//字长
length.set(key.getLength());
//定义输出
IntWritable [] temp = new IntWritable [2];
temp [0] = count;
temp [1] =长度;
context.write(key,new IntArrayWritable(temp));
解决方案一切看起来都很完美。只需要在子类中编写一个printStrings()方法,该方法返回一个字符串而不是数组。在构建的toString()中将返回字符串数组,这是它在输出中给出地址而不是值的原因。
public String printStrings(){
String strings =;
for(int i = 0; i< values.length; i ++){
strings = strings ++ values [i] .toString();
}
返回字符串;
}
I'm trying to get an output from an ArrayWritable in a simple MapReduce-Task. I found a few questions with a similar problem, but I can't solve the problem in my own code. So I'm looking forward to your help. Thanks :)!
Input: Textfile with some sentence.
Output should be:
<Word, <length, number of same words in Textfile>>
Example: Hello 5 2
The output that I get in my Job is:
hello WordLength_V01$IntArrayWritable@221cf05
test WordLength_V01$IntArrayWritable@799e525a
I think the problem is in the subclass from IntArrayWritable, but I don't get the right correction to fix this. By the we have Hadoop 2.5. I use the following code to get this result:
Main Method:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word length V1");
// Set Classes
job.setJarByClass(WordLength_V01.class);
job.setMapperClass(MyMapper.class);
// job.setCombinerClass(MyReducer.class);
job.setReducerClass(MyReducer.class);
// Set Output and Input Parameters
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntArrayWritable.class);
// Number of Reducers
job.setNumReduceTasks(1);
// Set FileDestination
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Mapper:
public static class MyMapper extends Mapper<Object, Text, Text, IntWritable> {
// Initialize Variables
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
// Map Method
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
// Use Tokenizer
StringTokenizer itr = new StringTokenizer(value.toString());
// Select each word
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
// Output Pair
context.write(word, one);
}
}
}
Reducer:
public static class MyReducer extends Reducer<Text, IntWritable, Text, IntArrayWritable> {
// Initialize Variables
private IntWritable count = new IntWritable();
private IntWritable length = new IntWritable();
// Reduce Method
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
// Count Words
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
count.set(sum);
// Wordlength
length.set(key.getLength());
// Define Output
IntWritable[] temp = new IntWritable[2];
IntArrayWritable output = new IntArrayWritable(temp);
temp[0] = count;
temp[1] = length;
// Output
output.set(temp);
context.write(key, new IntArrayWritable(output.get()));
}
}
SubClass
public static class IntArrayWritable extends ArrayWritable {
public IntArrayWritable(IntWritable[] intWritables) {
super(IntWritable.class);
}
@Override
public IntWritable[] get() {
return (IntWritable[]) super.get();
}
@Override
public void write(DataOutput arg0) throws IOException {
for(IntWritable data : get()){
data.write(arg0);
}
}
}
I used the following links to find a solution:
- Interface Writable (hadoop.apache.org)
- Class ArrayWritable (hadoop.apache.org)
- stackoverflow.com (1)
- stackoverflow.com (2)
I'm really thankful for any idea!
-------- Solution --------
New SubClass:
public static class IntArrayWritable extends ArrayWritable {
public IntArrayWritable(IntWritable[] values) {
super(IntWritable.class, values);
}
@Override
public IntWritable[] get() {
return (IntWritable[]) super.get();
}
@Override
public String toString() {
IntWritable[] values = get();
return values[0].toString() + ", " + values[1].toString();
}
}
New Reduce Method:
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
// Count Words
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
count.set(sum);
// Wordlength
length.set(key.getLength());
// Define Output
IntWritable[] temp = new IntWritable[2];
temp[0] = count;
temp[1] = length;
context.write(key, new IntArrayWritable(temp));
}
解决方案 Everything looks perfect. Just you need to write one more method printStrings() in your subclass that returns a string instead of array . In built toString() will return array of strings thats the reason it is giving address in your output instead of values.
public String printStrings() {
String strings = "";
for (int i = 0; i < values.length; i++) {
strings = strings + " "+ values[i].toString();
}
return strings;
}
这篇关于MapReduce输出ArrayWritable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!