问题描述
我有一个带foll结构的csv文件
I have a csv file with the foll struct
Name | Val1 | Val2 | Val3 | Val4 | Val5
John 1 2
Joe 1 2
David 1 2 10 11
我可以将其加载到RDD中.我试图创建一个架构,然后从中创建一个Dataframe
并得到一个indexOutOfBound
错误.
I am able to load this into an RDD fine. I tried to create a schema and then a Dataframe
from it and get an indexOutOfBound
error.
代码是这样的...
val rowRDD = fileRDD.map(p => Row(p(0), p(1), p(2), p(3), p(4), p(5), p(6) )
当我尝试对rowRDD
执行操作时,出现错误.
When I tried to perform an action on rowRDD
, gives the error.
任何帮助将不胜感激.
Any help is greatly appreciated.
推荐答案
这不是您的问题的答案.但这可能有助于解决您的问题.
This is not answer to your question. But it may help to solve your problem.
从这个问题中我看到您正在尝试从CSV创建数据框.
From the question I see that you are trying to create a dataframe from a CSV.
使用 spark-csv 包
在下面的spark-csv中,scala代码可用于读取CSVval df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load(csvFilePath)
With the spark-csv below scala code can be used to read a CSVval df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load(csvFilePath)
对于您的示例数据,我得到了以下结果
For your sample data I got the following result
+-----+----+----+----+----+----+
| Name|Val1|Val2|Val3|Val4|Val5|
+-----+----+----+----+----+----+
| John| 1| 2| | | |
| Joe| 1| 2| | | |
|David| 1| 2| | 10| 11|
+-----+----+----+----+----+----+
您还可以使用最新版本来推断Schema.看到此 answer
You can also inferSchema with latest version. See this answer
这篇关于为什么读取具有空值的csv文件会导致IndexOutOfBoundException?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!