换句话说,我需要测试目的),即使仍然有工作映射器,减少器开始减少。我知道这样我得到的结果是错误的,但是要知道这是改变框架部分的一些工作的开始。那么我应该在哪里开始寻找并进行修改?
这是在洗牌阶段完成的。对于Hadoop 1.x,看看 org.apache.hadoop.mapred.ReduceTask.ReduceCopier
,它实现了 ShuffleConsumerPlugin
。您可能还想阅读Verma等人的研究论文。
编辑:
在阅读@ chris-white的回答后,我意识到我的答案需要额外的解释。在MapReduce模型中,您需要等待所有映射器完成,因为这些键需要进行分组和排序;此外,您可能会运行一些推测性映射器,但您不知道哪个重复映射器会先完成。然而,正如打破MapReduce阶段的障碍一文所指出的,对于某些应用程序来说,不等待映射器的所有输出是有意义的。如果你想实现这种行为(最可能用于研究目的),那么你应该看看我上面提到的类。
In Hadoop MapReduce no reducer starts before all mappers are finished. Can someone please explain me at which part/class/codeline is this logic implemented? I am talking about Hadoop MapReduce version 1 (NOT Yarn). I have searched the map reduce framework but there are so many classes and i don't understand much the method calls and their ordering.
In other words i need (first for test purposes) to let the reducers start reducing even if there are still working mappers. I know that this way i am getting false results for the job but for know this is the start of some work for changing parts of the framework. So where should i start to look and make changes?
This is done in the shuffle phase. For Hadoop 1.x, take a look at org.apache.hadoop.mapred.ReduceTask.ReduceCopier
, which implements ShuffleConsumerPlugin
. You may also want to read the "Breaking the MapReduce Stage Barrier" research paper by Verma et al.
EDIT:
After reading @chris-white 's answer, I realized that my answer needed an extra explanation. In the MapReduce model, you need to wait for all mappers to finish, since the keys need to be grouped and sorted; plus, you may have some speculative mappers running and you do not know yet which of the duplicate mappers will finish first. However, as the "Breaking the MapReduce Stage Barrier" paper indicates, for some applications, it may make sense not to wait for all of the output of the mappers. If you would want to implement this sort of behavior (most likely for research purposes), then you should take a look at the classes I mentioned above.
这篇关于mapreduce的哪个部分/类是停止执行reduce任务的逻辑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!