问题描述
Hadoop中的排序比较器和组比较器之间有什么区别? 为了理解 GroupComparator ,请参阅我对这个问题的回答 -
SortComparator :用于定义映射输出键的排序方式
本书摘自Hadoop - 权威指南:
键的排序顺序如下:
-
如果设置属性 mapred.output.key.comparator.class ,显式地或由
) setOutputKeyComparatorClass()否则,键必须是 WritableComparable 的子类,并且键类的已注册
调用 setSortComparatorClass()$ c $在Job上,然后使用该类的一个实例。 (在
中,旧API的等效方法是 JobConf 。
比较器是
如果没有已注册的比较器,则使用 RawComparator 来反序列化
字节流被比较为对象并委托给 WritableComparable 的 compareTo()方法。
SortComparator Vs GroupComparator在一行内:
SortComparator 决定map输出键是如何排序的,而 GroupComparator 决定Reducer中的哪个映射输出键进入同一个reduce方法调用。
What are the differences between Sort Comparator and Group Comparator in Hadoop?
To understand GroupComparator, see my answer to this question -
What is the use of grouping comparator in hadoop map reduce
SortComparator:Used to define how map output keys are sorted
Excerpts from the book Hadoop - Definitive Guide:
Sort order for keys is found as follows:
If the property mapred.output.key.comparator.class is set, either explicitly or bycalling setSortComparatorClass() on Job, then an instance of that class is used. (Inthe old API the equivalent method is setOutputKeyComparatorClass() on JobConf.)
Otherwise, keys must be a subclass of WritableComparable, and the registeredcomparator for the key class is used.
If there is no registered comparator, then a RawComparator is used that deserializesthe byte streams being compared into objects and delegates to the WritableComparable’s compareTo() method.
SortComparator Vs GroupComparator in a one liner:SortComparator decides how map output keys are sorted while GroupComparator decides which map output keys within the Reducer go to the same reduce method call.
这篇关于Hadoop中的Sort Comparator和Group Comparator之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!