问题描述
在Hadoop数据的混洗阶段,映射数据根据reducer的分区跨群集的节点传输。
Hadoop使用什么协议在reduce节点上执行跨节点的数据混洗?
我真的笑了这是第一次,但整个shuffeling和合并是由一个 HTTPServlet
。
你可以看到这个匿名类 MapOutputServlet
中的 TaskTracker
源代码它获取一个包含任务和作业ID的HTTP请求,然后它将传入的输入流转换到磁盘上的本地文件系统。
During the shuffle stage of Hadoop data the mapped data is transferred across nodes of the clustersaccording to the partitions for the reducer.What protocol does Hadoop use for performing the shuffle of data across nodes for the reduce stage?
I really laughed for the first time, but the whole shuffeling and merging is done by a HTTPServlet
.
You can see this in the Tasktrackers
sourcecode in the anonymous class MapOutputServlet
It gets a HTTP request with IDs of the tasks and jobs and then it is going to transfer the incoming input stream into the local filesystem on disk.
这篇关于Hadoop shuffle使用哪种协议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!