问题描述
我阅读了权威指南以及网络上的其他链接,其中包括这里 我的问题是: 根据我的理解,它们都发生在mappers和reducer上。但是一些链接提到,映射器上发生的混洗以及对减速器的排序。 有人可以证实我的理解是否正确;如果没有,他们可以提供额外的文件,我可以通过? MapReduce保证每个reducer的输入都是按键排序的。系统执行排序的过程和 请看这张图 在Map和Reduce阶段为以上图片添加更多描述。 地图面: 当map函数开始产生输出时,它不会简单写入磁盘。在将输出写入映射到磁盘之前,线程第一个 Reduce Side : 所有映射输出都已被复制时,reduce任务将进入排序阶段(应适当地称为合并阶段,因为排序在地图一侧进行),它合并地图输出,保持排序顺序。这将一轮完成。 资料来源:Hadoop权威指南。 I read through the definitive guide and some other links on the web including the one here My question is As per my understanding, they happen on both mappers and reducers. But some links mention that shuffling happens on mappers and sorting on reducers. Can someone confirm if my understanding is correct; if not can they provide additional documentation I can go through? Shuffle: MapReduce makes the guarantee that the input to every reducer is sorted by key. The process by which the system performs the sort and Sort: Sorting happens in various stages of MapReduce program, So can exists in Map and Reduce phases. Please have a look at this diagram Adding more description to above image in Map and Reduce phases. The Map Side: When the map function starts producing output, it is not simply written to disk. Before Map output writes to disk, the thread first The Reduce Side: When all the map outputs have been copied, the reduce task moves into the sort phase (which should properly be called the merge phase, as the sorting was carried out on the map side), which merges the map outputs, maintaining their sort ordering. This will be done in rounds. Source : Hadoop Definitive Guide. 这篇关于随机播放并排序为mapreduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
将映射输出传输到reducers作为输入
称为shuffle。
$ b
$ b
将数据划分为最终要发送到的与reducer
对应的分区。在每个分区中,后台线程通过键
执行内存中排序。
transfers map outputs to the reducers as inputs
is known as the shuffle.divides the data into partitions corresponding to the reducers
that they will ultimately be sent to. Within each partition, the background thread performs an in-memory sort by key
.