调用reduceByKey时,它将所有具有相同键的值相加。有什么方法可以计算每个键的平均值?

// I calculate the sum like this and don't know how to calculate the avg
reduceByKey((x,y)=>(x+y)).collect


Array(((Type1,1),4.0), ((Type1,1),9.2), ((Type1,2),8), ((Type1,2),4.5), ((Type1,3),3.5),
((Type1,3),5.0), ((Type2,1),4.6), ((Type2,1),4), ((Type2,1),10), ((Type2,1),4.3))

最佳答案

一种方法是使用mapValues和reduceByKey,它比gregationByKey容易。

.mapValues(value => (value, 1)) // map entry with a count of 1
.reduceByKey {
  case ((sumL, countL), (sumR, countR)) =>
    (sumL + sumR, countL + countR)
}
.mapValues {
  case (sum , count) => sum / count
}
.collect

scala - Spark : Average of values instead of sum in reduceByKey using Scala-LMLPHP
https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html

关于scala - Spark : Average of values instead of sum in reduceByKey using Scala,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40087483/

10-16 01:59