问题描述
我想在星火数据帧使用ROWNUMBER。我的查询是否按预期工作在星火外壳。但是,当我在Eclipse中把它们写出来,编译一个罐子,我面临的一个错误
I am trying to use rowNumber in Spark data frames. My queries are working as expected in Spark shell. But when i write them out in eclipse and compile a jar, i am facing an error
16/03/23 05:52:43 ERROR ApplicationMaster: User class threw exception:org.apache.spark.sql.AnalysisException: Could not resolve window function 'row_number'. Note that, using window functions currently requires a HiveContext;
org.apache.spark.sql.AnalysisException: Could not resolve window function 'row_number'. Note that, using window functions currently requires a HiveContext;
我的查询
import org.apache.spark.sql.functions.{rowNumber, max, broadcast}
import org.apache.spark.sql.expressions.Window
val w = Window.partitionBy($"id").orderBy($"value".desc)
val dfTop = df.withColumn("rn", rowNumber.over(w)).where($"rn" <= 3).drop("rn")
在运行中火花壳查询我不使用HiveContext。不知道为什么,当我运行同一个jar文件它返回一个错误。而且我在星火1.6.0运行脚本是否有帮助。有没有人遇到类似的问题?
I am not using HiveContext while running the queries in spark shell. Not sure why it is returning an error when i run the same as a jar file. And also I am running the scripts on Spark 1.6.0 if that helps. Did anyone face similar issue?
推荐答案
我已经之前回答了.
You can read further about the difference between SQLContextand HiveContext here.
SparkSQL具有SQLContext和HiveContext。 HiveContext是一个超集SQLContext的。星火社区建议使用HiveContext。你可以看到,当您运行火花壳,这就是交互式驱动器的应用,它会自动创建定义为SC和定义为一个sqlContext一个HiveContext SparkContext。该HiveContext允许你执行的SQL查询以及蜂巢命令。
SparkSQL has a SQLContext and a HiveContext. HiveContext is a super set of the SQLContext. The Spark community suggest using the HiveContext. You can see that when you run spark-shell, which is your interactive driver application, it automatically creates a SparkContext defined as sc and a HiveContext defined as sqlContext. The HiveContext allows you to execute SQL queries as well as Hive commands.
您可以尝试检查你的火花外壳里面
:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74)
scala> sqlContext.isInstanceOf[org.apache.spark.sql.hive.HiveContext]
res0: Boolean = true
scala> sqlContext.isInstanceOf[org.apache.spark.sql.SQLContext]
res1: Boolean = true
scala> sqlContext.getClass.getName
res2: String = org.apache.spark.sql.hive.HiveContext
通过继承, HiveContext
实际上是一个 SQLContext
是的,但事实并非如此周围的其他方法。您可以检查来源$ C $ C 如果你知道更多位数的如何做 HiveContext
从 SQLContext $继承C $ C>。
By inheritance, HiveContext
is actually an SQLContext
, but it's not true the other way around. You can check the source code if you are more intersted in knowing how does HiveContext
inherits from SQLContext
.
这篇关于在星火使用窗口函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!