本文介绍了在Hive中使用where子句编写一个嵌套的select语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在Hive查询中的where子句中执行嵌套选择。示例代码片段如下所示;

从TableA中选择* b
$ b其中TA_timestamp>(从TableB中选择timestmp,其中id =hourDim)




$ b $这是可能的还是我在这里做错了什么,因为我在运行上面的脚本时遇到错误? b

为了进一步阐述我正在尝试做的事情,我有一个cassandra密钥空间,我使用时间戳发布统计信息。定期(例如每小时)使用配置单元对这些统计数据进行汇总,一旦总结出数据将与相应小时分开存储。因此,当查询第二次运行(以及连续运行)时,查询只应运行在新数据上(即 - timestamp> previous_execution_timestamp)。我试图通过将最近执行的时间戳存储在单独的配置表中,然后使用该值来过滤出原始统计信息。



这可以实现在 WHERE 子句中使用hive?!

解决方案子查询 strong> not supported in Hive:



然而,通常您可以使用 JOIN 同样的结果:



例如,这个查询:

  SELECT a.KEY,a.value 
FROM a
WHERE a.KEY IN
(SELECT b.KEY FROM B);

可以被重写为:

<$ p $ (a.KEY = b.KEY)
SELECT a.KEY,a.val
从左边的SEMI连接p $ p>

I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows;

select *
from TableA
where TA_timestamp > (select timestmp from TableB where id="hourDim")

Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?!

To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the corresponding hour. So when the query runs for the second time (and consecutive runs) the query should only run on the new data (i.e. - timestamp > previous_execution_timestamp). I am trying to do that by storing the latest executed timestamp in a separate hive table, and then use that value to filter out the raw stats.

Can this be achieved this using hive ?!

解决方案

Subqueries inside a WHERE clause are not supported in Hive:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries

However, often you can use a JOIN statement instead to get to the same result:https://karmasphere.com/hive-queries-on-table-data#join_syntax

For example, this query:

   SELECT a.KEY, a.value
   FROM a
   WHERE a.KEY IN
   (SELECT b.KEY FROM B);

can be rewritten to:

   SELECT a.KEY, a.val
   FROM a LEFT SEMI JOIN b ON (a.KEY = b.KEY)

这篇关于在Hive中使用where子句编写一个嵌套的select语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 08:28