我正在尝试使用来自Hadoop的MultipleInputs。我所有的映射器都是FixedLengthInputFormat。

MultipleInputs.addInputPath(job,
                    new Path(rootDir),
                    FixedLengthInputFormat.class,
                    OneToManyMapper.class);

问题是每个映射器都有固定的记录宽度和不同的大小。



无论如何,是否使用MultipleInputs为每个映射器传递FIXED_RECORD_LENGTH?

谢谢!

最佳答案

解决方法如下:

public class CustomFixedLengthInputFormat extends FixedLengthInputFormat{

    @Override
    public RecordReader<LongWritable, BytesWritable> createRecordReader(
            InputSplit split, TaskAttemptContext context) throws IOException,
            InterruptedException {
        //here i can control de recordLength size!
        int recordLength = ??;// getRecordLength(context.getConfiguration());
        if (recordLength <= 0) {
            throw new IOException(
                    "Fixed record length "
                            + recordLength
                            + " is invalid.  It should be set to a value greater than zero");
        }

        System.out.println("Record Length: " + recordLength);

        return new FixedLengthRecordReader(recordLength);
    }

}

关于hadoop - Hadoop-MultipleInputs,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26341913/

10-16 05:32