我对Impala / hive查询非常陌生,但对于如何制作此查询我不太确定。

该查询的目的是获取已定义范围的数据(完成条件的2点)。

更清楚地说,我们有一个包含3列的表格:日期,A和B。

我们按日期对表进行排序,我们希望从两个A = 1之间的所有间隔中获取所有行,其中两个A = 1中没有任何B = 1。 (因此,范围在每两个A = 1之间,并且条件是其中没有B = 1)。

我画了我想要的概念,使它变得更清晰。

链接:https://drive.google.com/open?id=0B_zAJFzI2slWQnRwN2gwWk9NSG8

最佳答案

select  dt,A,B

from   (select  dt,A,B
               ,max (case when A=1 then dt end) over p  as p_A1_dt
               ,max (case when B=1 then dt end) over p  as p_B1_dt
               ,min (case when A=1 then dt end) over f  as f_A1_dt
               ,min (case when B=1 then dt end) over f  as f_B1_dt

        from    mytable

        window  p as (order by dt rows between unbounded preceding and 1 preceding)
               ,f as (order by dt rows between 1 following and unbounded following)
        ) t

where   (   p_A1_dt >= p_B1_dt
        or  (   p_A1_dt is not null
            and p_B1_dt is null
            )
        )

    and (   f_A1_dt <= f_B1_dt
        or  (   f_A1_dt is not null
            and f_B1_dt is null
            )
        )

    and coalesce(A,-1) <> 1
相同,但没有window声明
select  dt,A,B

from   (select  dt,A,B
               ,max (case when A=1 then dt end) over (order by dt rows between unbounded preceding and 1 preceding)  as p_A1_dt
               ,max (case when B=1 then dt end) over (order by dt rows between unbounded preceding and 1 preceding)  as p_B1_dt
               ,min (case when A=1 then dt end) over (order by dt rows between 1 following and unbounded following)  as f_A1_dt
               ,min (case when B=1 then dt end) over (order by dt rows between 1 following and unbounded following)  as f_B1_dt

        from    mytable
        ) t

where   (   p_A1_dt >= p_B1_dt
        or  (   p_A1_dt is not null
            and p_B1_dt is null
            )
        )

    and (   f_A1_dt <= f_B1_dt
        or  (   f_A1_dt is not null
            and f_B1_dt is null
            )
        )

    and coalesce(A,-1) <> 1

09-04 16:06
查看更多