我有一个大型数据集,针对这个问题,它包含3个字段:
组标识符
从日期
至今
在任何给定的行上,From Date
始终小于To Date
,但是在每个组中,日期对表示的时间段(无特定顺序)可以重叠,包含在另一个之中,甚至相同。
我要结束的是一个查询,该查询将每个组的结果浓缩为连续的时间段。例如,如下所示的组:
| Group ID | From Date | To Date |
--------------------------------------
| A | 01/01/2012 | 12/31/2012 |
| A | 12/01/2013 | 11/30/2014 |
| A | 01/01/2015 | 12/31/2015 |
| A | 01/01/2015 | 12/31/2015 |
| A | 02/01/2015 | 03/31/2015 |
| A | 01/01/2013 | 12/31/2013 |
会导致以下结果:
| Group ID | From Date | To Date |
--------------------------------------
| A | 01/01/2012 | 11/30/2014 |
| A | 01/01/2015 | 12/31/2015 |
我已经阅读了很多有关日期打包的文章,但是我不太清楚如何将其应用于数据集。
如何构造可以给我这些结果的查询?
最佳答案
本书“使用窗口函数的Microsoft®SQLServer®2012高性能T-SQL”中的解决方案
;with C1 as(
select GroupID, FromDate as ts, +1 as type, 1 as sub
from dbo.table_name
union all
select GroupID, dateadd(day, +1, ToDate) as ts, -1 as type, 0 as sub
from dbo.table_name),
C2 as(
select C1.*
, sum(type) over(partition by GroupID order by ts, type desc
rows between unbounded preceding and current row) - sub as cnt
from C1),
C3 as(
select GroupID, ts, floor((row_number() over(partition by GroupID order by ts) - 1) / 2 + 1) as grpnum
from C2
where cnt = 0)
select GroupID, min(ts) as FromDate, dateadd(day, -1, max(ts)) as ToDate
from C3
group by GroupID, grpnum;
创建表:
if object_id('table_name') is not null
drop table table_name
create table table_name(GroupID varchar(100), FromDate datetime,ToDate datetime)
insert into table_name
select 'A', '01/01/2012', '12/31/2012' union all
select 'A', '12/01/2013', '11/30/2014' union all
select 'A', '01/01/2015', '12/31/2015' union all
select 'A', '01/01/2015', '12/31/2015' union all
select 'A', '02/01/2015', '03/31/2015' union all
select 'A', '01/01/2013', '12/31/2013'