hadoop地图减少二次排序

本文介绍了hadoop地图减少二次排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何人都可以解释我如何在hadoop中进行二级排序吗？

为什么必须使用 GroupingComparator 以及hadoop如何工作？

我正在浏览下面的链接，并对groupcomapator的工作方式产生怀疑。

任何人都可以解释分组比较器是如何工作的吗？

解决方案

分组比较

数据到达简化器后，所有数据都按键组合。由于我们有一个组合键，我们需要确保记录完全按照自然键组合。这是通过编写自定义GroupPartitioner来完成的。我们有一个Comparator对象只考虑TemperaturePair类的yearMonth字段，用于将记录分组在一起。

  public class YearMonthGroupingComparator延伸WritableComparator {
 
 public YearMonthGroupingComparator（）{
 super（TemperaturePair.class，true）; 
 
 $ b @Override 
 public int compare（WritableComparable tp1，WritableComparable tp2）{
 TemperaturePair temperaturePair =（TemperaturePair）tp1; 
 TemperaturePair temperaturePair2 =（TemperaturePair）tp2; 
 return temperaturePair.getYearMonth（）。compareTo（temperaturePair2.getYearMonth（））; 
 
 
 
 
 $ b $ p 
 $ b 以下是运行我们的二级分类作业的结果： 
  new-host-2：sbin bbejeck $ hdfs dfs -cat secondary-sort / part-r-00000 
  
 
 
 190101 -206 
 
 
 190102 -333 
 
 
 190103 -272 
 
 190104 -61 
 
 190105 -33  
 
 190106 44  
 
 190107 72  
 
 190108 44 
 
 190109 17 
 
 190110 -33 
 
  190111 -217  
 
  
 
 
虽然按价值排序数据可能不是一个常见的需求，但它一个很好的工具，在你需要的时候在后面的口袋里。此外，我们可以通过使用自定义分区程序和组分区来深入了解Hadoop的内部工作。 
请参阅此链接..     
Can any one explain me how secondary sorting works in hadoop ?
Why must one use GroupingComparator and how does it work in hadoop ?
I was going through the link given below and got doubt on how groupcomapator works.
Can any one explain me how grouping comparator works?
http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html
 解决方案 
Grouping Comparator
Once the data reaches a reducer, all data is grouped by key. Since we have a composite key, we need to make sure records are grouped solely by the natural key. This is accomplished by writing a custom GroupPartitioner. We have a Comparator object only considering the yearMonth field of the TemperaturePair class for the purposes of grouping the records together.
public class YearMonthGroupingComparator extends WritableComparator {

    public YearMonthGroupingComparator() {
        super(TemperaturePair.class, true);
    }

    @Override
    public int compare(WritableComparable tp1, WritableComparable tp2) {
        TemperaturePair temperaturePair = (TemperaturePair) tp1;
        TemperaturePair temperaturePair2 = (TemperaturePair) tp2;
        return temperaturePair.getYearMonth().compareTo(temperaturePair2.getYearMonth());
    }
}
Here are the results of running our secondary sort job:
new-host-2:sbin bbejeck$ hdfs dfs -cat secondary-sort/part-r-00000
190101  -206
190102  -333
190103  -272
190104  -61
190105  -33
190106  44
190107  72
190108  44
190109  17
190110  -33
190111  -217
190112  -300
While sorting data by value may not be a common need, it’s a nice tool to have in your back pocket when needed. Also, we have been able to take a deeper look at the inner workings of Hadoop by working with custom partitioners and group partitioners.Refer this link also..What is the use of grouping comparator in hadoop map reduce
                        这篇关于hadoop地图减少二次排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！