本文介绍了如何在foreachPartition中使用SQLContext和SparkContext的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在foreachPartition
中使用SparkContext和SQLContext,但是由于序列化错误而无法执行.我知道这两个对象都不可序列化,但是我认为foreachPartition
是在可同时使用Spark Context和SQLContext的主服务器上执行的.
您可能想要等同于:
myDStream.foreachRDD(rdd => {
val foo = rdd.mapPartitions(iter => doSomethingWithRedisClient(iter))
val df = sqlContext.createDataFrame(foo, schema)
df.write.parquet("s3n://bucket/pathToSentMessages)
})
I want to use SparkContext and SQLContext inside foreachPartition
, but unable to do it due to serialization error. I know that both objects are not serializable, but I thought that foreachPartition
is executed on the master, where both Spark Context and SQLContext are available..
You probably want equivalent of:
myDStream.foreachRDD(rdd => {
val foo = rdd.mapPartitions(iter => doSomethingWithRedisClient(iter))
val df = sqlContext.createDataFrame(foo, schema)
df.write.parquet("s3n://bucket/pathToSentMessages)
})
这篇关于如何在foreachPartition中使用SQLContext和SparkContext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!