问题描述
我有一个平面图,它返回序列 Seq((20,6),(22,6),(23,6),(24,6),(20,1),(22,1))
现在,我需要在从平面图获得的序列上使用 reduceByKey()
来找到每个键的最小值.
I have a flat map that returns the Sequence Seq((20,6),(22,6),(23,6),(24,6),(20,1),(22,1))
now I need to use the reduceByKey()
on the sequence that I got from the flat map to find the minimum value for each key.
我尝试使用 .reduceByKey(a,min(b))
和 .reduceByKey((a,b)= >> if(a._1< b._1)a其他b)
,但是它们都不起作用.
I tried using .reduceByKey(a,min(b))
and .reduceByKey((a, b) => if (a._1 < b._1) a else b)
but neither of them are working.
这是我的代码
for(i<- 1 to 5){
var graph=graph.flatMap{ in => in match{ case (x, y, zs) => (x, y) :: zs.map(z => (z, y))}
.reduceByKey((a, b) => if (a._1 < b._1) a else b)
}
对于平面图生成的每个不同键,我需要获取该键的最小值.例如:平面图生成Seq((20,6),(22,6),(23,6),(24,6),(20,1),(22,1)),resultByKey()应该生成(20,1),(22,1),(23,6),(24,6)
For each distinct key the flatmap generates I need to get the minimum value for that key. Eg: the flatmap generates Seq((20,6),(22,6),(23,6),(24,6),(20,1),(22,1)) the resultByKey() should generate (20,1),(22,1),(23,6),(24,6)
推荐答案
这是 reduceByKey
的签名:
def reduceByKey(func: (V, V) ⇒ V): RDD[(K, V)]
基本上,给定键/值对的RDD,您需要提供一个将两个值(而不是整个对)减少为一个的函数.因此,您可以按以下方式使用它:
Basically, given a RDD of key/value pairs, you need to provide a function that reduces two values (and not the entire pair) into one. Therefore, you can use it as follows:
val rdd = sc.parallelize(Seq((20,6),(22,6),(23,6),(24,6),(20,1),(22,1)))
val result = rdd.reduceByKey((a, b) => if (a < b) a else b)
result.collect
// Array[(Int, Int)] = Array((24,6), (20,1), (22,1), (23,6))
这篇关于如何在Scala中使用ReduceByKey()获得每个唯一键的最小值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!