中使用持久表时出现空指针异常

中使用持久表时出现空指针异常

本文介绍了尝试在 Spark Streaming 中使用持久表时出现空指针异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在开始时创建了gpsLookUpTable"并保留它,这样我就不需要一遍又一遍地拉它来做映射.但是,当我尝试在 foreach 内部访问它时,我得到了空指针异常.感谢任何帮助.

I am creating "gpsLookUpTable" at the beginning and persisting it so that i do not need to pull it over and over again to do mapping. However, when i try to access it inside foreach i get null pointer exception. Any help is appreciated thanks.

以下是代码片段:

def main(args: Array[String]): Unit = {

val conf = new SparkConf() ...

val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(20))
val sqc = new SQLContext(sc)

//////Trying to cache table here to use it below
val gpsLookUpTable = MapInput.cacheMappingTables(sc, sqc).persist(StorageLevel.MEMORY_AND_DISK_SER_2)
//sc.broadcast(gpsLookUpTable)
ssc.textFileStream("hdfs://localhost:9000/inputDirectory/")
.foreachRDD { rdd =>
if (!rdd.partitions.isEmpty) {

val allRows = sc.textFile("hdfs://localhost:9000/supportFiles/GeoHashLookUpTable")
sqc.read.json(allRows).registerTempTable("GeoHashLookUpTable")
val header = rdd.first().split(",")
val rowsWithoutHeader = Utils.dropHeader(rdd)

rowsWithoutHeader.foreach { row =>

val singleRowArray = row.split(",")
singleRowArray.foreach(println)
(header, singleRowArray).zipped
.foreach { (x, y) =>
///Trying to access persisted table but getting null pointer exception
val selectedRow = gpsLookUpTable
.filter("geoCode LIKE '" + GeoHash.subString(lattitude, longitude) + "%'")
.withColumn("Distance", calculateDistance(col("Lat"), col("Lon")))
.orderBy("Distance")
.select("TrackKM", "TrackName").take(1)
if (selectedRow.length != 0) {
// do something
}
else {
// do something
}
}
} }}

推荐答案

我假设您在集群中运行;您的 foreach 将在其他节点上作为闭包运行.引发 Nullpointer 是因为该闭包在没有初始化 gpsLookUpTable 的节点上运行.您显然确实尝试在

I assume you are running in a cluster; your foreach will run as a closure on other nodes. The Nullpointer is raised because that closure runs on a node which doesn't have a initialized gpsLookUpTable. You did obviously try to broadcast gpsLookUpTable in

//sc.broadcast(gpsLookUpTable)

但是这个需要绑定一个变量,基本上是这样的:

But this need to be bound to a variable, basically like this:

val tableBC = sc.broadcast(gpsLookUpTable)

在 foreach 中,您将替换为:

in foreach, you would replace this:

foreach { (x, y) =>
///Trying to access persisted table but getting null pointer exception
val selectedRow = gpsLookUpTable

这样:

foreach { (x, y) =>
///Trying to access persisted table but getting null pointer exception
val selectedRow = tableBC.value

有效地让您访问广播值.

which effectively give you access to the broadcast value.

这篇关于尝试在 Spark Streaming 中使用持久表时出现空指针异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 13:15