如何在foreachPartition中使用SQLContext和SparkContext

本文介绍了如何在foreachPartition中使用SQLContext和SparkContext的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在foreachPartition中使用SparkContext和SQLContext，但是由于序列化错误而无法执行.我知道这两个对象都不可序列化，但是我认为foreachPartition是在可同时使用Spark Context和SQLContext的主服务器上执行的.

您可能想要等同于:

myDStream.foreachRDD(rdd => {
   val foo = rdd.mapPartitions(iter => doSomethingWithRedisClient(iter))
   val df = sqlContext.createDataFrame(foo, schema)
   df.write.parquet("s3n://bucket/pathToSentMessages)
})

I want to use SparkContext and SQLContext inside foreachPartition, but unable to do it due to serialization error. I know that both objects are not serializable, but I thought that foreachPartition is executed on the master, where both Spark Context and SQLContext are available..

You probably want equivalent of:

myDStream.foreachRDD(rdd => {
   val foo = rdd.mapPartitions(iter => doSomethingWithRedisClient(iter))
   val df = sqlContext.createDataFrame(foo, schema)
   df.write.parquet("s3n://bucket/pathToSentMessages)
})

这篇关于如何在foreachPartition中使用SQLContext和SparkContext的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！