问题描述
更具体地说,如何将 scala.Iterable 转换为 org.apache.spark.rdd.RDD ?
To be more specific, how can i convert a scala.Iterable to a org.apache.spark.rdd.RDD ?
我的RDD为(String,Iterable [(String,Integer)])并且我希望将其转换为(字符串,RDD [字符串,整数])的 RDD ,以便可以将reduceByKey函数应用于内部RDD .
I have an RDD of (String, Iterable[(String, Integer)])and i want this to be converted into an RDD of (String, RDD[String, Integer]), so that i can apply a reduceByKey function to the internal RDD.
例如我有一个RDD,其中键是一个人名的2个字母前缀,值是他们在活动中花费的人名对和小时数的列表
e.gi have an RDD where key is 2-lettered prefix of a person's name and the value is List of pairs of Person name and hours that they spent in an event
我的RDD是:
("To",List(("Tom",50),("Tod","30"),("Tom",70),("Tod","25"),("Tod",15))("Ja",List(("Jack",50),("James","30"),("Jane",70),("James","25"),("Jasper",15))
我需要将List转换为RDD,以便我可以使用累加每个人的总工作时间.应用reduceByKey并将结果设为("To",RDD(("Tom",120),("Tod","70"))("Ja",RDD(("Jack",120),("James","55"),("Jane",15))
i need the List to be converted to RDD so that i can use accumulate each person's total hours spent. Applying reduceByKey and make the result as("To", RDD(("Tom",120),("Tod","70"))("Ja", RDD(("Jack",120),("James","55"),("Jane",15))
但是我找不到任何这样的转换函数.我怎样才能做到这一点 ?
But i counldn't find any such transformation function. How can i do this ?
谢谢.
推荐答案
您可以使用 flatMap
和 reduceByKey
来实现.像这样:
You can achieve this by using a flatMap
and reduceByKey
. Something like this:
rdd.flatMap{case(key, list) => list.map(item => ((key,item._1), item._2))}
.reduceByKey(_+_)
.map{case((key,name),hours) => (key, List((name, hours)))}
.reduceByKey(_++_)
这篇关于如何将Iterable转换为RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!