问题描述
我正在Spark Shell中运行此查询,但它给了我错误,
I am running this query in Spark shell but it gives me error,
sqlContext.sql(
"select sal from samplecsv where sal < (select MAX(sal) from samplecsv)"
).collect().foreach(println)
错误:
从samplecsv中选择sal,其中sal< (从samplecsv中选择MAX(sal)) ^ 在scala.sys.package $ .error(package.scala:27) 有人可以解释我吗,谢谢
select sal from samplecsv where sal < (select MAX(sal) from samplecsv) ^ at scala.sys.package$.error(package.scala:27) Can anybody explan me,thanks
推荐答案
计划的功能:
- SPARK-23945 (Column.isin()应该接受一个列DataFrame作为输入).
- SPARK-18455 (对相关子查询处理的一般支持).
- SPARK-23945 (Column.isin() should accept a single-column DataFrame as input).
- SPARK-18455 (General support for correlated subquery processing).
Spark 2.0 +
Spark SQL应该同时支持相关和不相关的子查询.参见 SubquerySuite
了解详情.一些示例包括:
Spark SQL should support both correlated and uncorrelated subqueries. See SubquerySuite
for details. Some examples include:
select * from l where exists (select * from r where l.a = r.c)
select * from l where not exists (select * from r where l.a = r.c)
select * from l where l.a in (select c from r)
select * from l where a not in (select c from r)
不幸的是,到目前为止(Spark 2.0)不可能使用DataFrame
DSL表达相同的逻辑.
Unfortunately as for now (Spark 2.0) it is impossible to express the same logic using DataFrame
DSL.
火花< 2.0
Spark在FROM
子句中支持子查询(与Hive< = 0.12相同).
Spark supports subqueries in the FROM
clause (same as Hive <= 0.12).
SELECT col FROM (SELECT * FROM t1 WHERE bar) t2
它根本不支持WHERE
子句中的子查询.通常来说,如果不提升为笛卡尔联接,就无法使用Spark来表示任意子查询(特别是相关子查询).
It simply doesn't support subqueries in the WHERE
clause.Generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't be expressed using Spark without promoting to Cartesian join.
因为子查询性能通常是典型关系系统中的重要问题,并且每个子查询都可以使用JOIN
表示,所以这里没有功能损失.
Since subquery performance is usually a significant issue in a typical relational system and every subquery can be expressed using JOIN
there is no loss-of-function here.
这篇关于SparkSQL是否支持子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!