我正在尝试使用来自Hadoop的MultipleInputs。我所有的映射器都是FixedLengthInputFormat。
MultipleInputs.addInputPath(job,
new Path(rootDir),
FixedLengthInputFormat.class,
OneToManyMapper.class);
问题是每个映射器都有固定的记录宽度和不同的大小。
无论如何,是否使用MultipleInputs为每个映射器传递FIXED_RECORD_LENGTH?
谢谢!
最佳答案
解决方法如下:
public class CustomFixedLengthInputFormat extends FixedLengthInputFormat{
@Override
public RecordReader<LongWritable, BytesWritable> createRecordReader(
InputSplit split, TaskAttemptContext context) throws IOException,
InterruptedException {
//here i can control de recordLength size!
int recordLength = ??;// getRecordLength(context.getConfiguration());
if (recordLength <= 0) {
throw new IOException(
"Fixed record length "
+ recordLength
+ " is invalid. It should be set to a value greater than zero");
}
System.out.println("Record Length: " + recordLength);
return new FixedLengthRecordReader(recordLength);
}
}
关于hadoop - Hadoop-MultipleInputs,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26341913/