在Map-Reduce中进行二次排序

本文介绍了在Map-Reduce中进行二次排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我了解在键进入减速器之前对特定键的值进行排序的方式。我知道这可以通过编写三种方法来完成，即keycomparator，partitioner和valuegrouping。

现在，当valuegrouping运行时，它基本上将所有与自然相关的值键，对吗？因此，当它将自然键的所有值分组时，与减速器的一组排序值一起发送的实际键将是什么？自然键可能与多种类型的实体（组合键的第二部分）相关联。发送给减速机的组合密钥是什么？

解决方案

迭代值Iterable实际上也更新了关键引用：

  protected void reduce（K key，Iterable< V> values，Context上下文）{
 for（V value：values）{
 //关键对象内容将为此循环的每次迭代更新
} 
}

我知道这适用于新的mapreduce API，我还没有为旧的mapred API进行追踪。

因此，在回答你的问题时，所有的密钥都可用，第一个密钥将与组中第一个排序的密钥相关。

编辑：关于这种工作方式和原因的一些附加信息：

Reducer用于处理键/值的两个比较器对映射阶段输出的对：

键排序比较器 - 此比较器应用于fi rst并命令所有的KV对。从概念上讲，您现在仍然在处理序列化的字节。

键组比较器 - 此比较器负责确定上一个键和当前键不同，表示一个键一组KV对和另一个

在引用中，对键和值的引用从不改变，每次调用Iterable.Iterator。 next（）将底层字节流中的指针前进到下一个KV对。如果关键的石斑鱼确定当前的一组关键字字节和先前的集合是相同的关键字，则值Iterable.iterator（）的hasNext方法将返回true，否则返回false。如果返回true，则将字节反序列化为Key和Value实例，以便在reduce方法中使用。

I understood the way of sorting the values of a particular key before the key enters the reducer. I learned that it can be done by writing three methods viz, keycomparator, partitioner and valuegrouping.
Now, when valuegrouping runs, it basically groups all the values associated with the natural key, right? So when it groups all the values for the natural key, what will be the actual key that is sent along with a set of sorted values to the reducer? The natural key would have been associated with more than one type of entity (the second part of the composite key). What will be the composite key sent to the reducer?
ap
解决方案
This may be surprising to know, but each iteration of the values Iterable actually updates the key reference too:
protected void reduce(K key, Iterable<V> values, Context context) { for (V value : values) { // key object contents will update for each iteration of this loop } }
I know this works for the new mapreduce API, i haven't traced it for the old mapred API.
So in answer to your question, all the keys will be available, the first key will relate to the first sorted key of the group.
EDIT: Some additional information as to how and why this works:
There are two comparators that the reducer uses to process the key/value pairs output by the map stage:
the key ordering comparator - This comparator is applied first and orders all the KV pairs. Conceptually you are still dealing with the serialized bytes at this stage.
the key group comparator - This comparator is responsible for determining when the previous and current key 'differ', denoting the boundary between one group of KV pairs and another
Under the hood, the reference to the key and value never changes, each call to Iterable.Iterator.next() advances the pointer in the underlying byte stream to the next KV pair. If the key grouper determines that the current set of keys bytes and previous set are comparatively the same key, then the hasNext method of the value Iterable.iterator() will return true, otherwise false. If true is returned, the bytes are deserialized into the Key and Value instances for consumption in your reduce method.

这篇关于在Map-Reduce中进行二次排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！