问题描述
我怎样才能跨越结合起来(这是描述正确的方法是什么?)两个RDDS?
How can I cross combine (is this the correct way to describe?) the two RDDS?
输入:
rdd1 = [a, b]
rdd2 = [c, d]
输出:
rdd3 = [(a, c), (a, d), (b, c), (b, d)]
我试过 rdd3 = rdd1.flatMap(波长X:rdd2.map(拉姆达Y:(X,Y))
,它抱怨看来,您正在尝试播放的RDD或一个动作或转换引用一个RDD。
。我想这意味着你不能嵌套动作
作为列表COM prehension,一个语句只能做一动作
。
I tried rdd3 = rdd1.flatMap(lambda x: rdd2.map(lambda y: (x, y))
, it complains that It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation.
. I guess that means you can not nest action
as in the list comprehension, and one statement can only do one action
.
推荐答案
所以,当你已经注意到了,你不能执行转化
在另一个转化
(注意, flatMap
&放大器; 地图
是转换
,而不是动作
,因为它们返回RDDS)。值得庆幸的是,你要完成的任务是直接由星火API在另一个转型的支持 - 即笛卡尔
(见的 http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD )。
So as you have noticed you can't perform a transformation
inside another transformation
(note that flatMap
& map
are transformations
rather than actions
since they return RDDs). Thankfully, what your trying to accomplish is directly supported by another transformation in the Spark API - namely cartesian
(see http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD ).
所以,你会想要做的 rdd1.cartesian(RDD2)
。
这篇关于交叉结合使用pyspark 2 RDDS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!