问题描述
是否可以从*mapred*.JobConf
创建有效的*mapreduce*.TaskAttemptID
?
Is it possible to create a valid *mapreduce*.TaskAttemptID
from *mapred*.JobConf
?
背景
我需要为ExistingFileInputFormat
写一个FileInputFormatAdapter
.问题是适配器需要扩展mapred.InputFormat
,而现有格式扩展mapreduce.InputFormat
.
I need to write a FileInputFormatAdapter
for an ExistingFileInputFormat
. The problem is that the Adapter needs to extend mapred.InputFormat
and the Existing format extends mapreduce.InputFormat
.
我需要构建一个mapreduce.TaskAttemptContextImpl
,以便可以实例化ExistingRecordReader
.但是,我无法创建有效的TaskId
... taskId显示为null.
I need to build a mapreduce.TaskAttemptContextImpl
, so that I can instantiate the ExistingRecordReader
. However, I can't create a valid TaskId
...the taskId comes out as null.
所以我如何从mapred.JobConf
获取taskId,jobId等.
So How can I get the taskId, jobId, etc from mapred.JobConf
.
尤其是在适配器的getRecordReader
中,我需要执行以下操作:
In particular in the Adapter's getRecordReader
I need to do something like:
public org.apache.hadoop.mapred.RecordReader<NullWritable, MyWritable> getRecordReader(
org.apache.hadoop.mapred.InputSplit split, JobConf job, Reporter reporter) throws IOException {
SplitAdapter splitAdapter = (SplitAdapter) split;
final Configuration conf = job;
/*************************************************/
//The problem is here, "mapred.task.id" is not in the conf
/*************************************************/
final TaskAttemptID taskId = TaskAttemptID.forName(conf.get("mapred.task.id"));
final TaskAttemptContext context = new TaskAttemptContextImpl(conf, taskId);
try {
return new RecordReaderAdapter(new ExistingRecordReader(
splitAdapter.getMapRedeuceSplit(),
context));
} catch (InterruptedException e) {
throw new RuntimeException("Failed to create record-reader.", e);
}
}
此代码引发异常:
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.<init>(TaskAttemptContextImpl.java:44)
at org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.<init>(TaskAttemptContextImpl.java:39)
'super(conf,taskId.getJobID());'引发异常,很可能是因为taskId为null.
'super(conf, taskId.getJobID());' is throwing the exception, most likely because taskId is null.
推荐答案
我通过浏览HiveHbaseTableInputFormat
找到了答案.由于我的解决方案针对蜂巢,因此效果很好.
I found the answer by looking through HiveHbaseTableInputFormat
. Since my solution is targeted for hive, this works perfectly.
TaskAttemptContext tac = ShimLoader.getHadoopShims().newTaskAttemptContext(
job.getConfiguration(), reporter);
这篇关于Hadoop-如何从mapred.JobConf中提取taskId?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!