我有一个可成功运行的hadoop程序。我需要从中提取jobID。我正在使用以下代码来做到这一点:

        Configuration conf = new Configuration();

        conf.addResource(new Path("../conf/core-site.xml"));
        conf.addResource(new Path("../conf/mapred-site.xml"));

        conf.addResource(new Path("../conf/hadoop/hdfs-site.xml"));


        Job job = new Job(conf,"CloudViTra2.0_Transcoder - Job1");



        job.setJarByClass(VideoTranscoder.class);
        job.setMapperClass(First_Mapper.class);
        job.setReducerClass(First_Reducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);


        FileInputFormat.addInputPath(job, new Path("../thesis_uploads/input/"+getFileName[0]+".txt"));


        Path output = new Path("../thesis_uploads/output_"+fileName+"/");


        FileOutputFormat.setOutputPath(job, output);

        job.waitForCompletion(true);
        currentJob = job.getJobID().toString();

这里的问题是该程序要等到作业完成为止。我在执行时需要jobID。我怎样才能做到这一点?

最佳答案

您可能需要使用Job Client API来引用this

使用jobstatus []。getAllJobs()和jobstatus []。jobsToComplete()可以获取当前正在运行的作业的jobId。

在下面找到一个伪代码:

    Configuration conf = new Configuration();
    conf.addResource(new Path(hadoopConfPath + "core-site.xml"));
    conf.addResource(new Path(hadoopConfPath + "hdfs-site.xml"));
    conf.addResource(new Path(hadoopConfPath + "mapred-site.xml"));

    InetSocketAddress jobtracker = new InetSocketAddress(jobTrackerHost, jobTrackerPort);
    JobClient jobClient = new JobClient(jobtracker, conf);
    jobClient.setConf(conf);

    JobStatus[] jobs = jobClient.getAllJobs();


    for (int i = 0; i < jobs.length; i++) {
        JobStatus js = jobs[i];
        JobID job1 = js.getJobID();

希望这可以帮助

关于java - 何时调用job.getJobID()?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/26720603/

10-10 08:17