本文介绍了PySpark groupByKey返回pyspark.resultiterable.ResultIterable的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图找出为什么我的groupByKey是返回以下内容:
[(0,<在0x7fc659e0a210&GT pyspark.resultiterable.ResultIterable对象),(1,< pyspark.resultiterable.ResultIterable对象在0x7fc659e0a4d0>),(2<在0x7fc659e0a390&GT pyspark.resultiterable.ResultIterable对象),(3,&下;在0x7fc659e0a290&GT pyspark.resultiterable.ResultIterable对象;),(4,&下;在0x7fc659e0a450&GT pyspark.resultiterable.ResultIterable对象;),(5,&所述; pyspark。在0x7fc659e0a350&GT resultiterable.ResultIterable对象;),(6,&下;在0x7fc659e0a1d0&GT pyspark.resultiterable.ResultIterable对象;),(7,&下;在0x7fc659e0a490&GT pyspark.resultiterable.ResultIterable对象;),(8,所述; pyspark.resultiterable。在0x7fc659e0a050&GT ResultIterable对象;),(9,&下;在0x7fc659e0a650&GT pyspark.resultiterable.ResultIterable对象;)]
我已经flatMapped看起来像这样的值:
[(0,u'D'),(0,u'D'),(0,u'D'),(0,u'D'),( 0,u'D'),(0,u'D'),(0,u'D'),(0,u'D'),(0,u'D'),(0,u'D ')]
我做的只是一个简单的:
groupRDD = columnRDD.groupByKey()
解决方案
什么你找回是一个对象,它允许您遍历结果。您可以通过调用值列表(),例如转groupByKey结果到列表中。
例如= sc.parallelize([(0,u'D'),(0,u'D'),(1,u'E'),(2,U 'F')])example.groupByKey()。收集()
#给出[(0,< pyspark.resultiterable.ResultIterable对象......]。example.groupByKey()图(波长X:(X [0],列表(x [1])))。收集()
#给出[(0,[u'D',u'D']),(1,[u'E']),(2,[u'F'])]
I am trying to figure out why my groupByKey is returning the following:
[(0, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a210>), (1, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a4d0>), (2, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a390>), (3, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a290>), (4, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a450>), (5, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a350>), (6, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a1d0>), (7, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a490>), (8, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a050>), (9, <pyspark.resultiterable.ResultIterable object at 0x7fc659e0a650>)]
I have flatMapped values that look like this:
[(0, u'D'), (0, u'D'), (0, u'D'), (0, u'D'), (0, u'D'), (0, u'D'), (0, u'D'), (0, u'D'), (0, u'D'), (0, u'D')]
I'm doing just a simple:
groupRDD = columnRDD.groupByKey()
解决方案
What you're getting back is an object which allows you to iterate over the results. You can turn the results of groupByKey into a list by calling list() on the values, e.g.
example = sc.parallelize([(0, u'D'), (0, u'D'), (1, u'E'), (2, u'F')])
example.groupByKey().collect()
# Gives [(0, <pyspark.resultiterable.ResultIterable object ......]
example.groupByKey().map(lambda x : (x[0], list(x[1]))).collect()
# Gives [(0, [u'D', u'D']), (1, [u'E']), (2, [u'F'])]
这篇关于PySpark groupByKey返回pyspark.resultiterable.ResultIterable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!