我有一个可成功运行的hadoop程序。我需要从中提取jobID。我正在使用以下代码来做到这一点:
Configuration conf = new Configuration();
conf.addResource(new Path("../conf/core-site.xml"));
conf.addResource(new Path("../conf/mapred-site.xml"));
conf.addResource(new Path("../conf/hadoop/hdfs-site.xml"));
Job job = new Job(conf,"CloudViTra2.0_Transcoder - Job1");
job.setJarByClass(VideoTranscoder.class);
job.setMapperClass(First_Mapper.class);
job.setReducerClass(First_Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path("../thesis_uploads/input/"+getFileName[0]+".txt"));
Path output = new Path("../thesis_uploads/output_"+fileName+"/");
FileOutputFormat.setOutputPath(job, output);
job.waitForCompletion(true);
currentJob = job.getJobID().toString();
这里的问题是该程序要等到作业完成为止。我在执行时需要jobID。我怎样才能做到这一点?
最佳答案
您可能需要使用Job Client API来引用this
使用jobstatus []。getAllJobs()和jobstatus []。jobsToComplete()可以获取当前正在运行的作业的jobId。
在下面找到一个伪代码:
Configuration conf = new Configuration();
conf.addResource(new Path(hadoopConfPath + "core-site.xml"));
conf.addResource(new Path(hadoopConfPath + "hdfs-site.xml"));
conf.addResource(new Path(hadoopConfPath + "mapred-site.xml"));
InetSocketAddress jobtracker = new InetSocketAddress(jobTrackerHost, jobTrackerPort);
JobClient jobClient = new JobClient(jobtracker, conf);
jobClient.setConf(conf);
JobStatus[] jobs = jobClient.getAllJobs();
for (int i = 0; i < jobs.length; i++) {
JobStatus js = jobs[i];
JobID job1 = js.getJobID();
希望这可以帮助
关于java - 何时调用job.getJobID()?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26720603/