本文介绍了访问BroadCasted数据框时获取Null指针异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
广播数据帧并尝试在Spark UDF中访问它们时,我得到了Null指针异常.
I am getting Null pointer exception when broadcasting a Dataframe and trying to access them in a Spark UDF.
UDF定义-
def test_udf(parm1: String, parm2: String, paarm3: String, ) = {
println ("Inside UDF ")
B.value.take(1).foreach { println }
println("after print")
..........}
............ }
> sqlContext.udf.register("test_udf", test_udf _)
广播-
val B = sc.broadcast(sqlContext.sql("""Select * FROM table_a where col1='10102'""")) // Returns almost 20 MB data
访问UDF-
val df = sqlContext.sql("SELECT test_udf(parm1,parm2,parm3) AS test FROM table_b").take(1)
此行之后,我在B行以下的UDF中得到空指针异常.value.take(1).foreach {println}
After this line i am getting null pointer exception in UDF at below line B.value.take(1).foreach { println }
我怀疑广播无法正确进行.这段代码有问题吗?使用Spark 1.6.1
I am suspecting that Broadcast is not happening correctly. Is it something wrong in this code? Using Spark 1.6.1
推荐答案
您收到一个异常,因为它不是有效的Spark程序:
You get an exception because it is not a valid Spark program:
- 广播
DataFrame
对象不是有意义的操作.这就是为什么我们具有广播加入提示. - Spark不支持对分布式数据结构的嵌套操作.换句话说,您无法在UDF中访问
DataFrame
.
- broadcasting
DataFrame
object is not a meaningful operation. This is why we have broadcast join hints. - Spark doesn't support nested operations on distributed data structure. In other words you cannot access
DataFrame
inside an UDF.
这篇关于访问BroadCasted数据框时获取Null指针异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!