为什么读取具有空值的csv文件会导致IndexOutOfBoundException? | 有空值的csv文件会导致IndexOutOfBoundExcep

本文介绍了为什么读取具有空值的csv文件会导致IndexOutOfBoundException?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带foll结构的csv文件

I have a csv file with the foll struct

Name | Val1 | Val2 | Val3 | Val4 | Val5
John     1      2
Joe      1      2
David    1      2            10    11

我可以将其加载到RDD中.我试图创建一个架构，然后从中创建一个Dataframe并得到一个indexOutOfBound错误.

I am able to load this into an RDD fine. I tried to create a schema and then a Dataframe from it and get an indexOutOfBound error.

代码是这样的...

val rowRDD = fileRDD.map(p => Row(p(0), p(1), p(2), p(3), p(4), p(5), p(6) )

当我尝试对rowRDD执行操作时，出现错误.

When I tried to perform an action on rowRDD, gives the error.

任何帮助将不胜感激.

Any help is greatly appreciated.

推荐答案

这不是您的问题的答案.但这可能有助于解决您的问题.

This is not answer to your question. But it may help to solve your problem.

从这个问题中我看到您正在尝试从CSV创建数据框.

From the question I see that you are trying to create a dataframe from a CSV.

使用 spark-csv 包

在下面的spark-csv中，scala代码可用于读取CSVval df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load(csvFilePath)

With the spark-csv below scala code can be used to read a CSVval df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load(csvFilePath)

对于您的示例数据，我得到了以下结果

For your sample data I got the following result

+-----+----+----+----+----+----+
| Name|Val1|Val2|Val3|Val4|Val5|
+-----+----+----+----+----+----+
| John|   1|   2|    |    |    |
|  Joe|   1|   2|    |    |    |
|David|   1|   2|    |  10|  11|
+-----+----+----+----+----+----+

您还可以使用最新版本来推断Schema.看到此 answer

You can also inferSchema with latest version. See this answer

这篇关于为什么读取具有空值的csv文件会导致IndexOutOfBoundException?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！