本文介绍了如何让Apache Spark忽略查询中的点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给出以下JSON文件:
Given the following JSON file:
[{"dog*woof":"bad dog 1","dog.woof":"bad dog 32"}]
为什么此Java代码失败:
Why does this Java code fail:
DataFrame df = sqlContext.read().json("dogfile.json");
df.groupBy("dog.woof").count().show();
但这不是
DataFrame df = sqlContext.read().json("dogfile.json");
df.groupBy("dog*woof").count().show();
这是失败的摘要:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'dog.woof' given input columns: [dog*woof, dog.woof];
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
...
推荐答案
它失败,因为使用点来访问struct
字段的属性.您可以使用反引号对列名进行转义:
It fails because dots are used to access attributes of the struct
fields. You can escape column names using backticks:
val df = sqlContext.read.json(sc.parallelize(Seq(
"""{"dog*woof":"bad dog 1","dog.woof":"bad dog 32"}"""
)))
df.groupBy("`dog.woof`").count.show
// +----------+-----+
// | dog.woof|count|
// +----------+-----+
// |bad dog 32| 1|
// +----------+-----+
但是在名称中使用特殊字符不是一个好习惯,通常可以使用.
but using special characters in the names is not a good practice and work with in general.
这篇关于如何让Apache Spark忽略查询中的点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!