无法成对使用不同数量的项目对RDD进行反序列化

本文介绍了无法成对使用不同数量的项目对RDD进行反序列化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个具有键值对的RDD.我想通过键将它们加入(并根据键获取所有值的笛卡尔积)，我认为可以使用pyspark的zip()函数来完成.但是，当我应用此方法时，

I have two RDD's which have key-value pairs. I want to join them by key (and according to the key, get cartesian product of all values), which I assume can be done with zip() function of pyspark. However, when I apply this,

elemPairs = elems1.zip(elems2).reduceByKey(add)

它给了我错误:

Cannot deserialize RDD with different number of items in pair: (40, 10)

这是我尝试压缩的2个RDD:

And here are the 2 RDD's which I try to zip:

elems1 => [((0, 0), ('A', 0, 90)), ((0, 1), ('A', 0, 90)), ((0, 2), ('A', 0, 90)), ((0, 3), ('A', 0, 90)), ((0, 4), ('A', 0, 90)), ((0, 0), ('A', 1, 401)), ((0, 1), ('A', 1, 401)), ((0, 2), ('A', 1, 401)), ((0, 3), ('A', 1, 401)), ((0, 4), ('A', 1, 401)), ((1, 0), ('A', 0, 305)), ((1, 1), ('A', 0, 305)), ((1, 2), ('A', 0, 305)), ((1, 3), ('A', 0, 305)), ((1, 4), ('A', 0, 305)), ((1, 0), ('A', 1, 351)), ((1, 1), ('A', 1, 351)), ((1, 2), ('A', 1, 351)), ((1, 3), ('A', 1, 351)), ((1, 4), ('A', 1, 351)), ((2, 0), ('A', 0, 178)), ((2, 1), ('A', 0, 178)), ((2, 2), ('A', 0, 178)), ((2, 3), ('A', 0, 178)), ((2, 4), ('A', 0, 178)), ((2, 0), ('A', 1, 692)), ((2, 1), ('A', 1, 692)), ((2, 2), ('A', 1, 692)), ((2, 3), ('A', 1, 692)), ((2, 4), ('A', 1, 692)), ((3, 0), ('A', 0, 936)), ((3, 1), ('A', 0, 936)), ((3, 2), ('A', 0, 936)), ((3, 3), ('A', 0, 936)), ((3, 4), ('A', 0, 936)), ((3, 0), ('A', 1, 149)), ((3, 1), ('A', 1, 149)), ((3, 2), ('A', 1, 149)), ((3, 3), ('A', 1, 149)), ((3, 4), ('A', 1, 149))]

elems2 => [((0, 0), ('B', 0, 573)), ((1, 0), ('B', 0, 573)), ((2, 0), ('B', 0, 573)), ((3, 0), ('B', 0, 573)), ((4, 0), ('B', 0, 573)), ((0, 0), ('B', 1, 324)), ((1, 0), ('B', 1, 324)), ((2, 0), ('B', 1, 324)), ((3, 0), ('B', 1, 324)), ((4, 0), ('B', 1, 324))]

其中((0, 0), ('B', 0, 573)), (0, 0)是键，而('B', 0, 573)是值.

在Google快速搜索后，我发现这是一个仅在spark 1.2中出现的问题，但是我使用了Spark 1.5

After a quick google search, I found that it is a problem which only occurs in spark 1.2, however I have used Spark 1.5

elems1

无法成对使用不同数量的项目对RDD进行反序列化

问题描述

推荐答案