大家好,关于“找出共同好友”的算法,网上有不少语言的实现,今天有空之余,自己研究了下Scala算法的写法
完整代码可以参考Git地址:https://github.com/benben7466/SparkDemo/blob/master/spark-test/src/main/scala/testCommendFriend.scala
录入的数据:
A:B,C,D,F,E,O B:A,C,E,K C:F,A,D,I D:A,E,F,L E:B,C,D,M,L F:A,B,C,D,E,O,M G:A,C,D,E,F H:A,C,D,E,O I:A,O J:B,O K:A,C,D L:D,E,F M:E,F,G O:A,H,I,J
核心算法:
package chunbo.recommend import org.apache.spark.SparkContext //共同好友统计问题 //参考:http://www.cnblogs.com/charlesblc/p/.html object testCommendFriend { def index(_spark_sc: SparkContext): Unit = { //获取数据 val friendRDD = _spark_sc.textFile(Config.HDFS_HOSH + "test/common_friend") //map val friendKV = friendRDD.map(x => { val fields = x.split(":") val person = fields() val friends = fields().split(",").toList (person, friends) }) val mapRDD = friendKV.flatMap(x => { until x._2.length) yield (x._2(i), x._1) }) //reduce val reduceRDD = mapRDD.reduceByKey(_ + "::" + _) //打印 reduceRDD.foreach(println) } }
参考:http://www.cnblogs.com/charlesblc/p/6126346.html