我对Impala / hive查询非常陌生,但对于如何制作此查询我不太确定。
该查询的目的是获取已定义范围的数据(完成条件的2点)。
更清楚地说,我们有一个包含3列的表格:日期,A和B。
我们按日期对表进行排序,我们希望从两个A = 1之间的所有间隔中获取所有行,其中两个A = 1中没有任何B = 1。 (因此,范围在每两个A = 1之间,并且条件是其中没有B = 1)。
我画了我想要的概念,使它变得更清晰。
链接:https://drive.google.com/open?id=0B_zAJFzI2slWQnRwN2gwWk9NSG8
最佳答案
select dt,A,B
from (select dt,A,B
,max (case when A=1 then dt end) over p as p_A1_dt
,max (case when B=1 then dt end) over p as p_B1_dt
,min (case when A=1 then dt end) over f as f_A1_dt
,min (case when B=1 then dt end) over f as f_B1_dt
from mytable
window p as (order by dt rows between unbounded preceding and 1 preceding)
,f as (order by dt rows between 1 following and unbounded following)
) t
where ( p_A1_dt >= p_B1_dt
or ( p_A1_dt is not null
and p_B1_dt is null
)
)
and ( f_A1_dt <= f_B1_dt
or ( f_A1_dt is not null
and f_B1_dt is null
)
)
and coalesce(A,-1) <> 1
相同,但没有window
声明select dt,A,B
from (select dt,A,B
,max (case when A=1 then dt end) over (order by dt rows between unbounded preceding and 1 preceding) as p_A1_dt
,max (case when B=1 then dt end) over (order by dt rows between unbounded preceding and 1 preceding) as p_B1_dt
,min (case when A=1 then dt end) over (order by dt rows between 1 following and unbounded following) as f_A1_dt
,min (case when B=1 then dt end) over (order by dt rows between 1 following and unbounded following) as f_B1_dt
from mytable
) t
where ( p_A1_dt >= p_B1_dt
or ( p_A1_dt is not null
and p_B1_dt is null
)
)
and ( f_A1_dt <= f_B1_dt
or ( f_A1_dt is not null
and f_B1_dt is null
)
)
and coalesce(A,-1) <> 1