本文介绍了hadoop地图减少二次排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以解释我如何在hadoop中进行二级排序吗?

为什么必须使用 GroupingComparator 以及hadoop如何工作?



我正在浏览下面的链接,并对groupcomapator的工作方式产生怀疑。

任何人都可以解释分组比较器是如何工作的吗?



解决方案

分组比较



数据到达简化器后,所有数据都按键组合。由于我们有一个组合键,我们需要确保记录完全按照自然键组合。这是通过编写自定义GroupPartitioner来完成的。我们有一个Comparator对象只考虑TemperaturePair类的yearMonth字段,用于将记录分组在一起。

  public class YearMonthGroupingComparator延伸WritableComparator {

public YearMonthGroupingComparator(){
super(TemperaturePair.class,true);

$ b @Override
public int compare(WritableComparable tp1,WritableComparable tp2){
TemperaturePair temperaturePair =(TemperaturePair)tp1;
TemperaturePair temperaturePair2 =(TemperaturePair)tp2;
return temperaturePair.getYearMonth()。compareTo(temperaturePair2.getYearMonth());




$ b $ p
$ b

以下是运行我们的二级分类作业的结果:

  new-host-2:sbin bbejeck $ hdfs dfs -cat secondary-sort / part-r-00000 



190101 -206



190102 -333



190103 -272

190104 -61

190105 -33

190106 44


190107 72

190108 44

190109 17

190110 -33

190111 -217



虽然按价值排序数据可能不是一个常见的需求,但它一个很好的工具,在你需要的时候在后面的口袋里。此外,我们可以通过使用自定义分区程序和组分区来深入了解Hadoop的内部工作。
请参阅此链接..

Can any one explain me how secondary sorting works in hadoop ?
Why must one use GroupingComparator and how does it work in hadoop ?

I was going through the link given below and got doubt on how groupcomapator works.
Can any one explain me how grouping comparator works?

http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html

解决方案

Grouping Comparator

Once the data reaches a reducer, all data is grouped by key. Since we have a composite key, we need to make sure records are grouped solely by the natural key. This is accomplished by writing a custom GroupPartitioner. We have a Comparator object only considering the yearMonth field of the TemperaturePair class for the purposes of grouping the records together.

public class YearMonthGroupingComparator extends WritableComparator {

    public YearMonthGroupingComparator() {
        super(TemperaturePair.class, true);
    }

    @Override
    public int compare(WritableComparable tp1, WritableComparable tp2) {
        TemperaturePair temperaturePair = (TemperaturePair) tp1;
        TemperaturePair temperaturePair2 = (TemperaturePair) tp2;
        return temperaturePair.getYearMonth().compareTo(temperaturePair2.getYearMonth());
    }
}

Here are the results of running our secondary sort job:

new-host-2:sbin bbejeck$ hdfs dfs -cat secondary-sort/part-r-00000

190101 -206

190102 -333

190103 -272

190104 -61

190105 -33

190106 44

190107 72

190108 44

190109 17

190110 -33

190111 -217

190112 -300

While sorting data by value may not be a common need, it’s a nice tool to have in your back pocket when needed. Also, we have been able to take a deeper look at the inner workings of Hadoop by working with custom partitioners and group partitioners.Refer this link also..What is the use of grouping comparator in hadoop map reduce

这篇关于hadoop地图减少二次排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 04:55
查看更多