合并包括时间范围在内的连续重复记录

本文介绍了合并包括时间范围在内的连续重复记录的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个非常相似的问题，与此处提出的问题类似：

I have a very similar problem to the question asked here: Merge duplicate temporal records in database

这里的区别是，我需要结束日期为实际日期而不是NULL。

The difference here is, that I need the end date to be an actual date instead of NULL.

给定以下数据：

EmployeeId   StartDate   EndDate     Column1   Column2
1000         2009/05/01  2010/04/30   X         Y
1000         2010/05/01  2011/04/30   X         Y
1000         2011/05/01  2012/04/30   X         X
1000         2012/05/01  2013/04/30   X         Y
1000         2013/05/01  2014/04/30   X         X
1000         2014/05/01  2014/06/01   X         X

所需的结果是：

EmployeeId   StartDate   EndDate     Column1   Column2
1000         2009/05/01  2011/04/30   X         Y
1000         2011/05/01  2012/04/30   X         X
1000         2012/05/01  2013/04/30   X         Y
1000         2013/05/01  2014/06/01   X         X

链接线程中建议的解决方案是：

The proposed solution in the linked thread is this:

with  t1 as  --tag first row with 1 in a continuous time series
(
select t1.*, case when t1.column1=t2.column1 and t1.column2=t2.column2
                  then 0 else 1 end as tag
  from test_table t1
  left join test_table t2
    on t1.EmployeeId= t2.EmployeeId and dateadd(day,-1,t1.StartDate)= t2.EndDate
)
select t1.EmployeeId, t1.StartDate,
       case when min(T2.StartDate) is null then null
            else dateadd(day,-1,min(T2.StartDate)) end as EndDate,
       t1.Column1, t1.Column2
  from (select t1.* from t1 where tag=1 ) as t1  -- to get StartDate
  left join (select t1.* from t1 where tag=1 ) as t2  -- to get a new EndDate
    on t1.EmployeeId= t2.EmployeeId and t1.StartDate < t2.StartDate
 group by t1.EmployeeId, t1.StartDate, t1.Column1,   t1.Column2;

但是，当您需要结束日期而不是空值时，这似乎不起作用。

However, this does not seem to work when you need the end date instead of just NULL.

有人可以帮我解决这个问题吗？

Could someone help me with this issue?

推荐答案

这怎么办？

create table test_table (EmployeeId int, StartDate  date, EndDate  date,   Column1 char(1),  Column2 char(1))
;
insert into test_table values
 (1000    ,     '2009-05-01','2010-04-30','X','Y')
,(1000    ,     '2010-05-01','2011-04-30','X','Y')
,(1000    ,     '2011-05-01','2012-04-30','X','X')
,(1000    ,     '2012-05-01','2013-04-30','X','Y')
,(1000    ,     '2013-05-01','2014-04-30','X','X')
,(1000    ,     '2014-05-01','2014-06-01','X','X')
;
SELECT EmployeeId, StartDate, EndDate, Column1, Column2 FROM
(
    SELECT EmployeeId, StartDate
    ,      MAX(EndDate) OVER(PARTITION BY EmployeeId, RN) AS EndDate
    ,      Column1
    ,      Column2
    ,      DIFF
    FROM
    (
        SELECT t.*
        ,      SUM(DIFF) OVER(PARTITION BY EmployeeId ORDER BY StartDate ) AS RN
        FROM
        (
            SELECT t.*
            ,      CASE WHEN
                       Column1 = LAG(Column1,1) OVER(PARTITION BY EmployeeId ORDER BY StartDate)
                   AND Column2 = LAG(Column2,1) OVER(PARTITION BY EmployeeId ORDER BY StartDate)
                   THEN 0 ELSE 1 END AS DIFF
            FROM
                test_table t
        ) t
    )
)
WHERE DIFF = 1
;

这篇关于合并包括时间范围在内的连续重复记录的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

this