python - 减少 Spark 并映射问题

我在Spark中做一个小实验，遇到了麻烦。

wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)]


# TODO: Replace <FILL IN> with appropriate code
from operator import add
totalCount = (wordCounts
              .map(lambda x: (x,1))   <==== something wrong with this line maybe
              .reduce(sum))            <====omething wrong with this line maybe
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)

# TEST Mean using reduce (3b)
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')

最佳答案

我想出了解决方案：

from operator import add
totalCount = (wordCounts
              .map(lambda x: x[1])
              .reduce(add))
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)

关于python - 减少 Spark 并映射问题，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/30696968/