编辑:问题已解决-我犯了一个非常愚蠢的错误。
我有一个MapReduce管道,其中包括一个map,reduce,map和reduce。我将SequenceFileOutputFormat用于第一个reduce,并将SequenceFileInputFormat用于第二个映射。我已经查看了它的用法,似乎我正在正确使用它。我要输入的类型是IntWritable和IntPairArrayWritable(使用mahout中的IntPairWritable的自定义ArrayWritable子类)。问题是,当在第二张 map 中读取IntPairArrayWritable时,当我尝试取出各个IntPairWritables时,我收到了ClassCastException。我不确定这是由于我使用ArrayWritable类的方式错误还是由于我使用SequenceFile {Input,Output} Format引起的错误。我已经在这里和其他地方查看了很多示例,在我看来,我都正确地执行了两个示例,但是仍然出现错误。有什么帮助吗?
具体内容:
这是我的第一个 reducer 类:
public static class WalkIdReducer extends MapReduceBase implements
Reducer<IntWritable, IntPairWritable, IntWritable, IntPairArrayWritable> {
@Override
public void reduce(IntWritable walk_id, Iterator<IntPairWritable> values,
OutputCollector<IntWritable, IntPairArrayWritable> output,
Reporter reporter) throws IOException {
ArrayList<IntPairWritable> value_array = new ArrayList<IntPairWritable>();
while (values.hasNext()) {
value_array.add(values.next());
}
output.collect(walk_id, IntPairArrayWritable.fromArrayList(value_array));
}
}
第二个映射器类:
public static class NodePairMapper extends MapReduceBase implements
Mapper<IntWritable, IntPairArrayWritable, IntPairWritable, Text> {
@Override
public void map(IntWritable key, IntPairArrayWritable value,
OutputCollector<IntPairWritable, Text> output,
Reporter reporter) throws IOException {
// The following line gives a ClassCastException;
// See IntPairArrayWritable.toArrayList(), below
ArrayList<IntPairWritable> values = value.toArrayList();
// other unimportant stuff
}
}
第一个MapReduce的作业配置的相关部分:
conf.setReducerClass(WalkIdReducer.class);
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(IntPairArrayWritable.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
对于第二个MapReduce:
conf.setInputFormat(SequenceFileInputFormat.class);
conf.setMapperClass(NodePairMapper.class);
最后,我的ArrayWritable子类:
public static class IntPairArrayWritable extends ArrayWritable
{
// These two methods are what people say is all you need for
// creating an ArrayWritable subclass
public IntPairArrayWritable() {
super(IntPairArrayWritable.class);
}
public IntPairArrayWritable(IntPairWritable[] values) {
super(IntPairArrayWritable.class, values);
}
// Some convenience methods, so I can use ArrayLists in
// other parts of the code
public static IntPairArrayWritable fromArrayList(
ArrayList<IntPairWritable> array) {
IntPairArrayWritable writable = new IntPairArrayWritable();
IntPairWritable[] values = new IntPairWritable[array.size()];
for (int i=0; i<array.size(); i++) {
values[i] = array.get(i);
}
writable.set(values);
return writable;
}
public ArrayList<IntPairWritable> toArrayList() {
ArrayList<IntPairWritable> array = new ArrayList<IntPairWritable>();
for (Writable pair : this.get()) {
// This line is what kills it. I get a ClassCastException here.
IntPairWritable int_pair = (IntPairWritable) pair;
array.add(int_pair);
}
return array;
}
}
我得到的具体错误如下:
java.lang.ClassCastException: WalkAnalyzer$IntPairArrayWritable cannot be cast to org.apache.mahout.common.IntPairWritable
at WalkAnalyzer$IntPairArrayWritable.toArrayList(WalkAnalyzer.java:231)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:84)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:77)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
对于为什么ArrayWritable的get()方法产生的是
WalkAnalyzer$IntPairArrayWritable
的实例,我感到很困惑-我期望get()返回IntPairArrayWritable
包含的元素数组,如API中所述。编辑
我发现了问题。这就是我为IntPairArrayWritable编写构造函数的方式。我本应该打电话给
super(IntPairArrayWritable.class);
的时候就打电话给super(IntPairWritable.class);
。该代码实际上应如下所示:public static class IntPairArrayWritable extends ArrayWritable
{
// These two methods are what people say is all you need for
// creating an ArrayWritable subclass
public IntPairArrayWritable() {
super(IntPairWritable.class);
}
public IntPairArrayWritable(IntPairWritable[] values) {
super(IntPairWritable.class, values);
}
}
我想对ArrayWritable子类使用一个不太明显的混淆名称是个好主意,因此该错误将更容易发现。
最佳答案
检查您的导入语句中的IntPairWritable。看起来您在Mapper中选择了错误的程序包名称,因此正在转换为差异类,即使它的名称也是IntPairWritable也是如此。