本文介绍了有没有办法从MR作业中的reduce任务访问许多成功的地图任务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 在我的Hadoop redurs 中,我需要知道当前作业中执行了多少次成功的地图任务。我已经想出了以下内容,据我所知,这是行不通的。 计数器totalMapsCounter = context.getCounter(JobInProgress.Counter.TOTAL_LAUNCHED_MAPS); Counter failedMapsCounter = context.getCounter(JobInProgress.Counter.NUM_FAILED_MAPS); long nSuccessfulMaps = totalMapsCounter.getValue() - failedMapsCounter.getValue(); 另外,如果有一种好方法可以检索(再次从我的 )输入分割的总数(不是文件数量,也不是对一个文件进行分割,而是对作业进行分割),这可能也适用。 (假设我的工作正常完成,应该是相同的数字,对吧?) 编辑看起来像在地图中检索计数器并使用Job或JobConf减少任务不是一个好习惯。这是备用方法,用于将摘要详细信息从映射器传递到还原器。这种方法需要编写一些代码,但是可行。如果该功能是Hadoop的一部分,并且不需要手动编写代码,那本来就不错。我已请求将此功能放入Hadoop并等待响应。 JobCounter.TOTAL_LAUNCHED_MAPS使用以下代码检索在旧的MR API的Reducer类中。 private String jobID; 私人长期推出地图; $ b $ public void configure(JobConf jobConf){ try { jobID = jobConf.get(mapred.job.id); JobClient jobClient =新的JobClient(jobConf); RunningJob job = jobClient.getJob(JobID.forName(jobID)); if(job == null){ System.out.println(没有找到ID的作业+ jobID); } else { Counters counters = job.getCounters(); launchMaps = counters.getCounter(JobCounter.TOTAL_LAUNCHED_MAPS); } } catch(Exception e){ e.printStackTrace(); $ b 使用新的API,Reducer实现可以访问配置通过 JobContext#getConfiguration()。上面的代码可以在 Reducer#setup()。 Reducer#configure()在新的MR API中调用API和Reducer#setup(),在每个reduce任务之前调用一次 Reducer.reduce()被调用。 顺便说一句,这些计数器可以从其他JVM获得,也可以在启动该工作的计算机旁找到。 b @ InterfaceAudience.LimitedPrivate({MapReduce}) @ InterfaceStability.Unstable JobCounter.TOTAL_LAUNCHED_MAPS还包含由于投机执行也是 In my Hadoop reducers, I need to know how many successful map tasks were executed in the current job. I've come up with the following, which as far as I can tell does NOT work. Counter totalMapsCounter = context.getCounter(JobInProgress.Counter.TOTAL_LAUNCHED_MAPS); Counter failedMapsCounter = context.getCounter(JobInProgress.Counter.NUM_FAILED_MAPS); long nSuccessfulMaps = totalMapsCounter.getValue() - failedMapsCounter.getValue();Alternatively, if there's a good way that I could retrieve (again, from within my reducers) the total number of input splits (not number of files, and not splits for one file, but total splits for the job), that would probably also work. (Assuming my job completes normally, that should be the same number, right?) 解决方案 Edit: Looks like it's not a good practice to retrieve the counters in the map and reduce tasks using Job or JobConf. Here is an alternate approach for passing the summary details from the mapper to the reducer. This approach requires some effort to code, but is doable. It would have been nice if the feature had been part of Hadoop and not required to hand code it. I have requested to put this feature into Hadoop and waiting for the response.JobCounter.TOTAL_LAUNCHED_MAPS was retrieved using the below code in the Reducer class with the old MR API.private String jobID;private long launchedMaps;public void configure(JobConf jobConf) { try { jobID = jobConf.get("mapred.job.id"); JobClient jobClient = new JobClient(jobConf); RunningJob job = jobClient.getJob(JobID.forName(jobID)); if (job == null) { System.out.println("No job with ID found " + jobID); } else { Counters counters = job.getCounters(); launchedMaps = counters.getCounter(JobCounter.TOTAL_LAUNCHED_MAPS); } } catch (Exception e) { e.printStackTrace(); }}With the new API, Reducer implementations can access the Configuration for the job via the JobContext#getConfiguration(). The above code can be implemented in Reducer#setup().Reducer#configure() in the old MR API and Reducer#setup() in the new MR API, are invoked once for each reduce task before the Reducer.reduce() is invoked.BTW, the counters can be got from other JVM also beside the one which kicked the job.JobInProgress is defined as below, so it should not be used. This API is for limited projects only and the interface may change. @InterfaceAudience.LimitedPrivate({"MapReduce"})@InterfaceStability.UnstableNot that, JobCounter.TOTAL_LAUNCHED_MAPS also includes map tasks launched due to speculative execution also 这篇关于有没有办法从MR作业中的reduce任务访问许多成功的地图任务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-19 09:22