我正在学习MapReduce,我写了一个程序来计算会员和非会员完成的总预订时间。我通过了所需的所有可能的作业配置,但是在运行hadoop命令时,它将抛出错误的值类。我尝试在stackoverflow中搜索许多解决方案,但无法调试问题。映射的输出和减速器的输入正确。
有人可以帮我吗?
public class BixiMontrealAnalysis {
public static class BixiMapper extends Mapper <LongWritable, Text, IntWritable, IntWritable> {
public void map(LongWritable offset, Text line, Context context) throws IOException, InterruptedException {
String csvAttributes[] = line.toString().split(",");
int isMember = 0;
int duration = 0;
try {
duration = Integer.parseInt(csvAttributes[4]);
isMember = Integer.parseInt(csvAttributes[5]);
} catch (Exception e) {
System.out.println("Will Emit 0,0");
}
context.write(new IntWritable(isMember), new IntWritable(duration));
}
}
public static class BixiReducer extends Reducer <IntWritable, IntWritable, IntWritable, LongWritable> {
public void reduce(IntWritable isMember, Iterable <IntWritable> combinedDurationByIsMember, Context context) throws IOException, InterruptedException {
long sum = 0L;
for (IntWritable duration: combinedDurationByIsMember) {
sum = sum + (long) duration.get();
}
context.write(isMember, new LongWritable(sum));
}
}
public static void main(String args[]) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = new Job(conf, "bix-montreal-job");
job.setJarByClass(BixiMontrealAnalysis.class);
job.setMapperClass(BixiMapper.class);
job.setCombinerClass(BixiReducer.class);
job.setReducerClass(BixiReducer.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(LongWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
我期望输出为
0, sum of duration
和1, sum of duration
的K,VCSV内容
start_date,start_station_code,end_date,end_station_code,duration_sec,is_member
2019-07-01 00:00:03,6014,2019-07-01 00:04:26,6023,262,1
2019-07-01 00:00:07,6036,2019-07-01 00:34:54,6052,2087,0
2019-07-01 00:00:11,6018,2019-07-01 00:06:48,6148,396,1
2019-07-01 00:00:12,6202,2019-07-01 00:17:25,6280,1032,1
2019-07-01 00:00:15,6018,2019-07-01 00:06:57,6148,401,0
2019-07-01 00:00:20,6248,2019-07-01 00:15:40,6113,920,1
2019-07-01 00:00:37,6268,2019-07-01 00:15:00,6195,862,0
下面是堆栈跟踪
Error: java.io.IOException: wrong value class: class org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:194)
at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1374)
at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1691)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at com.onboarding.hadoop.BixiMontrealAnalysis$BixiReducer.reduce(BixiMontrealAnalysis.java:43)
at com.onboarding.hadoop.BixiMontrealAnalysis$BixiReducer.reduce(BixiMontrealAnalysis.java:37)
最佳答案
job.setCombinerClass(BixiReducer.class);
我已将
Combiner
类设置为与Reducer
相同的类,不应使用标准的WordCount问题。我对Combiner
进行了研究,发现使用Combiner
类是为了产生中间记录,因此Reducer
上的负载较小。