问题描述
我发现自己不得不制定一个(对我来说)相当复杂的 SQL 查询,而且我似乎无法理解它.
I am finding myself in the position of having to formulate a (to me) rather complex SQL query and I can't seem to get my head around it.
我有一个名为 orders
的表和一个相关的表 order_state_history
,用于记录这些订单随时间推移的状态(见下文).
I have a table called orders
and a related table order_state_history
that logs the state of those orders over time (see below).
我现在需要生成一系列行 - 每天一行 - 包含当天结束时处于特定状态的订单数量(请参阅 report
).另外我只想考虑 order.type = 1
的订单.
I now need to generate a series of rows - one row per day - containing the amount of orders that were in particular states at the end of that day (see report
). Also I want to consider only orders of order.type = 1
.
数据驻留在 PostgreSQL 数据库中.我已经找到了如何使用 GENERATE_SERIES(DATE '2001-01-01', CURRENT_DATE, '1 DAY'::INTERVAL) days
生成时间序列,它允许我为以下日期生成行没有记录状态变化.
The data resides in a PostgreSQL database. I already found out how to generate a time series using GENERATE_SERIES(DATE '2001-01-01', CURRENT_DATE, '1 DAY'::INTERVAL) days
which allows me to generate rows for days on which no state changes were recorded.
我目前的方法是将 orders
、order_state_history
和生成的一系列 days
连接在一起,并尝试过滤掉所有有 DATE(order_state_history.timestamp) >DATE(days)
然后通过 first_value(order_state_history.new_state) OVER (PARTITION_BY(orders.id) ORDER BY order_state_history.timestamp DESC)
以某种方式获得当天每个订单的最终状态,但这就是我的一点点 SQL 经验抛弃我的地方.
My current approach is to join orders
, order_state_history
and the generated series of days
all together and try to filter out all the rows that have DATE(order_state_history.timestamp) > DATE(days)
and then somehow get the final state of each order on that day by first_value(order_state_history.new_state) OVER (PARTITION_BY(orders.id) ORDER BY order_state_history.timestamp DESC)
, but this is where my tiny bit of SQL experience abandons me.
我就是无法解决这个问题.
I just can't wrap my head around the problem.
这甚至可以在单个查询中解决,还是建议我通过某种每天执行一个查询的智能脚本来计算数据?解决问题的合理方法是什么?
Can this even be solved in a single query or would I be better adviced to compute the data by some kind of intelligent script that performs one query per day?What would be a reasonable approach to the problem?
orders===
id type
10000 1
10001 1
10002 2
10003 2
10004 1
order_state_history===
order_id index timestamp new_state
10000 1 01.01.2001 12:00 NEW
10000 2 02.01.2001 13:00 ACTIVE
10000 3 03.01.2001 14:00 DONE
10001 1 02.01.2001 13:00 NEW
10002 1 03.01.2001 14:00 NEW
10002 2 05.01.2001 10:00 ACTIVE
10002 3 05.01.2001 14:00 DONE
10003 1 07.01.2001 04:00 NEW
10004 1 05.01.2001 14:00 NEW
10004 2 10.01.2001 17:30 DONE
Expected result===
date new_orders active_orders done_orders
01.01.2001 1 0 0
02.01.2001 1 1 0
03.01.2001 1 0 1
04.01.2001 1 0 1
05.01.2001 2 0 1
06.01.2001 2 0 1
07.01.2001 2 0 1
08.01.2001 2 0 1
09.01.2001 2 0 1
10.01.2001 1 0 2
推荐答案
步骤 1. 计算每个订单的累积状态总和,使用值 NEW = 1, ACTIVE = 1, DONE = 2:
Step 1. Calculate a cumulative sum of state for each order, using values NEW = 1, ACTIVE = 1, DONE = 2:
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
order_id | day | state
----------+------------+-------
10000 | 2001-01-01 | 1
10000 | 2001-01-02 | 2
10000 | 2001-01-03 | 4
10001 | 2001-01-02 | 1
10004 | 2001-01-05 | 1
10004 | 2001-01-10 | 3
(6 rows)
步骤 2. 根据步骤 1 中的状态计算每个订单的转移矩阵(2 表示 NEW->ACTIVE,3 表示 NEW->DONE,4 表示 ACTIVE->DONE):
Step 2. Calculate a transition matrix for each order based on states from step 1 (2 means NEW->ACTIVE, 3 means NEW->DONE, 4 means ACTIVE->DONE):
select
order_id, day, state,
case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
case when state > 2 then 1 else 0 end as done
from (
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
) s
order_id | day | state | new | active | done
----------+------------+-------+-----+--------+------
10000 | 2001-01-01 | 1 | 1 | 0 | 0
10000 | 2001-01-02 | 2 | -1 | 1 | 0
10000 | 2001-01-03 | 4 | 0 | -1 | 1
10001 | 2001-01-02 | 1 | 1 | 0 | 0
10004 | 2001-01-05 | 1 | 1 | 0 | 0
10004 | 2001-01-10 | 3 | -1 | 0 | 1
(6 rows)
步骤 3. 计算每个状态在一系列天数中的累积总和:
Step 3. Calculate a cumulative sum of each state for a series of days:
select distinct
day::date,
sum(new) over w as new,
sum(active) over w as active,
sum(done) over w as done
from generate_series('2001-01-01'::date, '2001-01-10', '1d'::interval) day
left join (
select
order_id, day, state,
case when state = 1 then 1 when state = 2 or state = 3 then -1 else 0 end as new,
case when state = 2 then 1 when state = 4 then -1 else 0 end as active,
case when state > 2 then 1 else 0 end as done
from (
select
order_id, timestamp::date as day,
sum(case new_state when 'DONE' then 2 else 1 end) over w as state
from order_state_history h
join orders o on o.id = h.order_id
where o.type = 1
window w as (partition by order_id order by timestamp)
) s
) s
using(day)
window w as (order by day)
order by 1
day | new | active | done
------------+-----+--------+------
2001-01-01 | 1 | 0 | 0
2001-01-02 | 1 | 1 | 0
2001-01-03 | 1 | 0 | 1
2001-01-04 | 1 | 0 | 1
2001-01-05 | 2 | 0 | 1
2001-01-06 | 2 | 0 | 1
2001-01-07 | 2 | 0 | 1
2001-01-08 | 2 | 0 | 1
2001-01-09 | 2 | 0 | 1
2001-01-10 | 1 | 0 | 2
(10 rows)
这篇关于使用 PostgreSQL 查询生成具有每日统计数据的时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!