本文介绍了AnalysisException:u"无法解析给定输入列的“名称":spark中sqlContext中的[list]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了一个简单的示例,例如:

I tried a simple example like:

data = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load("/databricks-datasets/samples/population-vs-price/data_geo.csv")

data.cache() # Cache data for faster reuse
data = data.dropna() # drop rows with missing values
data = data.select("2014 Population estimate", "2015 median sales price").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF()

效果很好,但是当我尝试类似的东西时:

It works well, But when i try something very similar like:

data = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load('/mnt/%s/OnlineNewsTrainingAndValidation.csv' % MOUNT_NAME)

data.cache() # Cache data for faster reuse
data = data.dropna() # drop rows with missing values
data = data.select("timedelta", "shares").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF()
display(data)

它引发错误:AnalysisException:u无法解析给定输入列的'timedelta':[data_channel_is_tech,...

It raise error: AnalysisException: u"cannot resolve 'timedelta' given input columns: [ data_channel_is_tech,...

当然,我导入了LabeledPoint和LinearRegression

off-course I imported LabeledPoint and LinearRegression

有什么问题吗?

更简单的情况

df_cleaned = df_cleaned.select("shares")

引发相同的AnalysisException(错误).

raises same AnalysisException (error).

*请注意:df_cleaned.printSchema()效果很好.

*please note: df_cleaned.printSchema() works well.

推荐答案

我发现了问题:某些列名称在名称本身之前包含空格.所以

I found the issue: some of the column names contain white spaces before the name itself. So

data = data.select(" timedelta", " shares").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF()

工作.我可以使用

assert " " not in ''.join(df.columns)  

现在,我正在考虑一种删除空白的方法.任何想法都非常感谢!

Now I am thinking of a way to remove the white spaces. Any idea is much appreciated!

这篇关于AnalysisException:u"无法解析给定输入列的“名称":spark中sqlContext中的[list]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 00:34