Spark作业由于java.io.NotSerializableException而失败：org.apache.spark.SparkContext

本文介绍了Spark作业由于java.io.NotSerializableException而失败：org.apache.spark.SparkContext的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我尝试在 RDD [（Int，ArrayBuffer [（Int，Double）]）] 输入上应用方法（ComputeDwt）时，我遇到了上述异常。
我甚至使用扩展序列化选项来序列化spark.Here中的对象是代码片段。

<$输入：系列：RDD [（Int，ArrayBuffer [（Int，Double）]）]
DWTsample扩展Serialization是一个具有computeDwt函数的类。
sc：sparkContext

val kk：RDD [（Int，List [Double]）] = series.map（t =>（t._1，new DWTsample（）。computeDwt sc，t._2）））

错误：
org.apache.spark.SparkException：作业失败：java.io.NotSerializableException：org.apache.spark.SparkContext
org.apache.spark.SparkException：作业失败：java.io.NotSerializableException：org.apache.spark.SparkContext $ b $ org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply（DAGScheduler.scala ：760）
at org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply（DAGScheduler.scala：758）
at scala.collection.mutable.ResizableArray $ class.foreach（ResizableArray .scala：60）
at scala.collection.mutable.ArrayBuffer.foreach（ArrayBuffer.scala：47）
at org.apache.spark.scheduler.DAGScheduler.abortStage（DAGScheduler.scala：758）
at org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ submitMissingTasks（DAGScheduler.scala：556）
at org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ submitStage（DAGScheduler.scala：503）
at org.apache.spark.scheduler.DAGScheduler.processEvent（DAGScheduler.scala：361）
at org.apache。 spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ run（DAGScheduler.scala：441）
at org.apache.spark.scheduler.DAGScheduler $$ anon $ 1.run（DAGScheduler.scala： 149）

任何人都可以告诉我可能是什么问题，应该怎么做来克服这个问题？

解决方案

  series.map（t =>（t._1，new DWTsample（）。computeDwt（sc，t._2）））

引用SparkContext（ sc ），但SparkContext不可序列化。 SparkContext旨在公开在驱动程序上运行的操作;它不能被在worker上运行的代码引用/使用。

您必须重新构造代码，以使 sc 在映射函数关闭中未被引用。

I am facing above exception when I am trying to apply a method(ComputeDwt) on RDD[(Int,ArrayBuffer[(Int,Double)])] input.I am even using extends Serialization option to serialize objects in spark.Here is the code snippet.
input:series:RDD[(Int,ArrayBuffer[(Int,Double)])] DWTsample extends Serialization is a class having computeDwt function. sc: sparkContext val kk:RDD[(Int,List[Double])]=series.map(t=>(t._1,new DWTsample().computeDwt(sc,t._2))) Error: org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext org.apache.spark.SparkException: Job failed: java.io.NotSerializableException: org.apache.spark.SparkContext at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:556) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:503) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:361) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441) at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
Could anyone suggest me what could be the problem and what should be done to overcome this issue?
解决方案
The line
series.map(t=>(t._1,new DWTsample().computeDwt(sc,t._2)))
references the SparkContext (sc) but SparkContext isn't serializable. SparkContext is designed to expose operations that are run on the driver; it can't be referenced/used by code that's run on workers.
You'll have to re-structure your code so that sc isn't referenced in your map function closure.

这篇关于Spark作业由于java.io.NotSerializableException而失败：org.apache.spark.SparkContext的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！