本文介绍了如何让Apache Spark忽略查询中的点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下JSON文件:

Given the following JSON file:

[{"dog*woof":"bad dog 1","dog.woof":"bad dog 32"}]

为什么此Java代码失败:

Why does this Java code fail:

DataFrame df = sqlContext.read().json("dogfile.json");
df.groupBy("dog.woof").count().show();

但这不是

DataFrame df = sqlContext.read().json("dogfile.json");
df.groupBy("dog*woof").count().show();

这是失败的摘要:

 Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'dog.woof' given input columns: [dog*woof, dog.woof];
    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
...

推荐答案

它失败,因为使用点来访问struct字段的属性.您可以使用反引号对列名进行转义:

It fails because dots are used to access attributes of the struct fields. You can escape column names using backticks:

val df = sqlContext.read.json(sc.parallelize(Seq(
   """{"dog*woof":"bad dog 1","dog.woof":"bad dog 32"}"""
)))

df.groupBy("`dog.woof`").count.show
// +----------+-----+
// |  dog.woof|count|
// +----------+-----+
// |bad dog 32|    1|
// +----------+-----+

但是在名称中使用特殊字符不是一个好习惯,通常可以使用.

but using special characters in the names is not a good practice and work with in general.

这篇关于如何让Apache Spark忽略查询中的点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 23:50