为什么SparkSQL不支持子查询

为什么SparkSQL不支持子查询

本文介绍了为什么SparkSQL不支持子查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行在星火外壳此查询,但它给我的错误,

  sqlContext.sql(
 选择samplecsv那里SAL&LT SAL(选择samplecsv MAX(SAL))
).collect()的foreach(的println)

错误:

解决方案

Spark 2.0+

Spark SQL should support both correlated and uncorrelated subqueries. See SubquerySuite for details. Some examples include:

select * from l where exists (select * from r where l.a = r.c)
select * from l where not exists (select * from r where l.a = r.c)

select * from l where l.a in (select c from r)
select * from l where a not in (select c from r)

Spark < 2.0

Spark supports subqueries in the FROM clause (same as Hive <= 0.12).

SELECT col FROM (SELECT *  FROM t1 WHERE bar) t2

It simply doesn't support subqueries in the WHERE clause.Generally speaking arbitrary subqueries (in particular correlated subqueries) couldn't be expressed using Spark without promoting to Cartesian join.

Since subquery performance is usually a significant issue in a typical relational system and every subquery can be expressed using JOIN there is no loss-of-function here.

这篇关于为什么SparkSQL不支持子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-25 09:10