Spark Dataframe count 函数和更多函数抛出 IndexOutOfBoundsException

本文介绍了Spark Dataframe count 函数和更多函数抛出 IndexOutOfBoundsException的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

1) 初始过滤的 RDD 为空值.

val rddWithOutNull2 = rddSlices.filter(x => x(0) != null)

2) 然后把这个RDD转换成Row的RDD

3) 使用 Scala 将 RDD 转换为 Dataframe 后:

val df = spark.createDataFrame(rddRow,schema)
df.printSchema()

输出:

root
 |-- name: string (nullable = false)


println(df.count())

输出:

Error :
count : :
[Stage 11:==================================>                       (3 + 2) / 5][error] o.a.s.e.Executor - Exception in task 4.0 in stage 11.0 (TID 16)
java.lang.IndexOutOfBoundsException: 0

没有其他 spark sql 函数处理此 spark 数据帧.

推荐答案

同意评论，问题似乎出在 x(0) 上.如果有一个空行，它会抛出那个Exception.一种解决方案(取决于变量 x 的类型)是使用 headOption

Agree with the comments, the problem seems to be in x(0). If there is an empty row, it will throw that Exception. One solution (depending on the type of the variable x) is to retrieve it with a headOption

val rddWithOutNull2 = rddSlices.filter(_.headOption.isDefined)

这篇关于Spark Dataframe count 函数和更多函数抛出 IndexOutOfBoundsException的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！