问题描述
下面是一个包含稀疏日期的 mysql 表.
Below is a mysql table which has sparse dates.
col dt_id value
A1 2018-05-28 30
A1 2018-05-30 20
A1 2018-05-31 50
A1 2018-06-01 50
A1 2018-06-04 80
A1 2018-06-05 50
输出应该类似于下面的内容,其中缺少的日期与最后一个值一起填充.
The output should be something like below where missing dates are populated along with the last value.
col dt_id value
A1 2018-05-28 30
A1 2018-05-29 30
A1 2018-05-30 20
A1 2018-05-31 50
A1 2018-06-01 50
A1 2018-06-02 50
A1 2018-06-03 50
A1 2018-06-04 80
A1 2018-06-05 50
这里生成了以下内容.
A1 2018-05-29 30
A1 2018-06-02 50
A1 2018-06-03 50
我知道使用 last_value() over (partition by..
) 的 oracle 解决方案,但由于这是 mysql,所以它有点棘手.
I know solutions with oracle using last_value() over (partition by..
, but since this is mysql, its a bit tricky.
这是我尝试过的:
创建时间表并填充数据:
create a time table and populate with data:
CREATE TABLE `time_table` (date_id date not null);
create table ints ( i tinyint ); insert into ints values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
insert into time_table (date_id) select date('2016-09-01')+ interval a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i day
from ints a
join ints b
join ints c
join ints d
join ints e
where (a.i*10000 + b.i*1000 + c.i*100 + d.i*10 + e.i) <= 11322 order by 1;
select * from time_table limit 10;
+------------+
| date_id |
+------------+
| 2018-09-22 |
| 2018-09-21 |
| 2018-09-20 |
| 2018-09-19 |
| 2018-09-18 |
| 2018-09-17 |
| 2018-09-16 |
| 2018-09-15 |
| 2018-09-14 |
| 2018-09-13 |
+------------+
Here is the data for the balance table:
Here is the data
+------+------------+-------+
| A1 | 2018-05-28 | 30 |
| A1 | 2018-05-30 | 20 |
| A1 | 2018-05-31 | 50 |
| A1 | 2018-06-01 | 50 |
| A1 | 2018-06-04 | 80 |
| A1 | 2018-06-05 | 50 |
| B1 | 2018-05-28 | 30 |
| B1 | 2018-05-30 | 20 |
| B1 | 2018-05-31 | 50 |
| B1 | 2018-06-01 | 50 |
| B1 | 2018-06-04 | 80 |
| B1 | 2018-06-05 | 50 |
| C1 | 2018-05-28 | 30 |
| C1 | 2018-05-30 | 20 |
| C1 | 2018-05-31 | 50 |
| C1 | 2018-06-01 | 50 |
| C1 | 2018-06-04 | 80 |
| C1 | 2018-06-05 | 50 |
| D1 | 2018-06-28 | 30 |
| D1 | 2018-07-02 | 20 |
| D1 | 2018-07-04 | 50 |
| D1 | 2018-07-08 | 80 |
| D1 | 2018-07-19 | 50 |
+------+------------+-------+
mysql> select b.id, ab.id, tt.`date_id` as cal_date, b.`mx` as ex_date, val
-> from time_table tt
-> inner join (select id, min(date_id) mi, max(date_id) mx from balance group by id) b
-> on tt.`date_id` >= b.`mi`
-> and tt.`date_id` <= b.mx
-> left join (select id, date_id, sum(value) val from balance group by id, date_id) ab
-> on ab.id = b.id and tt.`date_id` = ab.date_id
-> order by cal_date;
+------+------+------------+------------+------+
| id | id | cal_date | ex_date | val |
+------+------+------------+------------+------+
| A1 | A1 | 2018-05-28 | 2018-06-05 | 30 |
| A1 | NULL | 2018-05-29 | 2018-06-05 | NULL |
| A1 | A1 | 2018-05-30 | 2018-06-05 | 20 |
| A1 | A1 | 2018-05-31 | 2018-06-05 | 50 |
| A1 | A1 | 2018-06-01 | 2018-06-05 | 50 |
| A1 | NULL | 2018-06-02 | 2018-06-05 | NULL |
| A1 | NULL | 2018-06-03 | 2018-06-05 | NULL |
| A1 | A1 | 2018-06-04 | 2018-06-05 | 80 |
| A1 | A1 | 2018-06-05 | 2018-06-05 | 50 |
+------+------+------------+------------+------+
推荐答案
对于 MySQL 8:
For MySQL 8:
with recursive rcte(dt_id, col, value) as (
(
select dt_id, col, value
from mytable
order by dt_id
limit 1
)
union all
select r.dt_id + interval 1 day
, coalesce(t.col, r.col)
, coalesce(t.value, r.value)
from rcte r
left join mytable t on t.dt_id = r.dt_id + interval 1 day
where r.dt_id < (select max(dt_id) from mytable)
)
select r.col, r.dt_id, r.value
from rcte r
order by r.dt_id
递归查询将逐行构建日期,从第一个日期开始到最后一个日期.value
(和 col
)取自原始表,该表在日期上保持连接.如果原始表没有日期行,则取递归中最后一行的值.
The recursive query will build row by row incrementing the date starting from the first date until the last. The value
(and col
) is taken from the original table, which is left joined on date. If the original table doesn't have a row for a date, the value of the last row in the recursion is taken instead.
对于旧版本,您可以使用日历表和左连接 ON 子句中的子查询来获取最后存在的值:
For older versions you can use your calendar table and a subquery in the left joins ON clause to get last existing values:
select b.col, c.date_id, b.value
from time_table c
left join balance b on b.dt_id = (
select max(dt_id)
from balance b1
where b1.dt_id <= c.date_id
)
where c.date_id >= (select min(dt_id) from balance)
and c.date_id <= (select max(dt_id) from balance)
由于问题已更改:
select b.col, c.date_id, b.value
from (
select col, min(dt_id) as min_dt, max(dt_id) as max_dt
from balance
group by col
) i
join time_table c
on c.date_id >= i.min_dt
and c.date_id <= i.max_dt
left join balance b
on b.col = i.col
and b.dt_id = (
select max(dt_id)
from balance b1
where b1.dt_id <= c.date_id
and b1.col = i.col
)
order by b.col, c.date_id
确保您在 (col, dt_id)
上有一个索引.在最好的情况下,它将是主键.time_table
中的 date_id
也应该被索引或者是主键.
Make sure you have an index on (col, dt_id)
. In best case it would be the primary key. date_id
in the time_table
should also be indexed or the primary key.
这篇关于mysql用以前的值生成缺失的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!