问题描述
我有一个包含这些值的表;
I have a table with these values;
user_id ts val
uid1 19.05.2019 01:49:50 0
uid1 19.05.2019 01:50:15 0
uid1 19.05.2019 01:50:20 0
uid1 19.05.2019 01:59:50 1
uid1 19.05.2019 02:20:10 1
uid1 19.05.2019 02:20:15 0
uid1 19.05.2019 02:20:19 0
uid1 19.05.2019 02:30:53 1
uid1 19.05.2019 11:10:25 1
uid1 19.05.2019 11:13:40 0
uid1 19.05.2019 11:13:50 0
uid1 19.05.2019 11:20:19 1
uid2 19.05.2019 15:01:44 0
uid2 19.05.2019 15:05:55 0
uid2 19.05.2019 17:19:35 1
uid2 19.05.2019 17:20:01 0
uid2 19.05.2019 17:20:35 0
uid2 19.05.2019 19:15:50 1
当我只查询带有by by子句的表时,结果看起来像这样;
When I query this table with only partition by clause, result seems like this;
查询:选择*,将sum(val)超过(由user_id划分)作为example_table的res;
user_id ts val res
uid1 19.05.2019 01:49:50 0 5
uid1 19.05.2019 01:50:15 0 5
uid1 19.05.2019 01:50:20 0 5
uid1 19.05.2019 01:59:50 1 5
uid1 19.05.2019 02:20:10 1 5
uid1 19.05.2019 02:20:15 0 5
uid1 19.05.2019 02:20:19 0 5
uid1 19.05.2019 02:30:53 1 5
uid1 19.05.2019 11:10:25 1 5
uid1 19.05.2019 11:13:40 0 5
uid1 19.05.2019 11:13:50 0 5
uid1 19.05.2019 11:20:19 1 5
uid2 19.05.2019 15:01:44 0 2
uid2 19.05.2019 15:05:55 0 2
uid2 19.05.2019 17:19:35 1 2
uid2 19.05.2019 17:20:01 0 2
uid2 19.05.2019 17:20:35 0 2
uid2 19.05.2019 19:15:50 1 2
在以上结果中, res 列具有每个分区的 val 列的总和值。但是,如果我用分区和顺序查询表,就会得到这些结果;
In the above results, res column has total sum value of the val column for each partition. But, If I'll query table with partition by and order by, I'm getting these results;
查询: select *,sum( val)over(以ts的user_id顺序划分)作为example_table的res;
user_id ts val res
uid1 19.05.2019 01:49:50 0 0
uid1 19.05.2019 01:50:15 0 0
uid1 19.05.2019 01:50:20 0 0
uid1 19.05.2019 01:59:50 1 1
uid1 19.05.2019 02:20:10 1 2
uid1 19.05.2019 02:20:15 0 2
uid1 19.05.2019 02:20:19 0 2
uid1 19.05.2019 02:30:53 1 3
uid1 19.05.2019 11:10:25 1 4
uid1 19.05.2019 11:13:40 0 4
uid1 19.05.2019 11:13:50 0 4
uid1 19.05.2019 11:20:19 1 5
uid2 19.05.2019 15:01:44 0 0
uid2 19.05.2019 15:05:55 0 0
uid2 19.05.2019 17:19:35 1 1
uid2 19.05.2019 17:20:01 0 1
uid2 19.05.2019 17:20:35 0 1
uid2 19.05.2019 19:15:50 1 2
但是使用order by子句, res 列具有每个分区的每一行的 value 列的累积总和。
But with order by clause, res column has the cumulative sum of the value column for each row for each partition.
为什么?我听不懂
推荐答案
更新
此行为已记录:
这意味着:
在没有 frame_clause的情况下 –默认使用 RANGE UNBOUNDED PRECEDING
。其中包括:
In absence of a frame_clause – RANGE UNBOUNDED PRECEDING
is used by default. That includes:
- 所有行都根据
ORDER BY 子句
- 当前行
- 在
ORDER BY 列作为当前行
All rows "preceding" the current row according to the
ORDER BY
clauseThe current row
All rows which have the same values in the
ORDER BY
columns as the current row
在没有
ORDER BY
子句的情况下–假定 ORDER BY NULL
(尽管我又在猜测)。因此,框架将包括分区中的所有行,因为 ORDER BY
列中的值是
In absence of an
ORDER BY
clause – ORDER BY NULL
is assumed (though I'm guessing again). Thus the frame will include all rows from the partition, because the values in the ORDER BY
column(s) are the same (which is always NULL
) in every row.
免责声明:以下内容是猜测,而不是合格的答案。我没有找到任何可以证实我所写内容的文档。同时,我认为当前给出的答案不能正确解释该行为。
Disclaimer: The following is more a guess than a qualified answer. I didn't find any documentation, which can confirm what I write. At the same time I don't think that currently given answers correctly explain the behavior.
结果差异的原因不是直接的ORDER BY子句,因为
a + b + c
与 c + b + a
相同。原因是(这是我的猜测)ORDER BY子句将 frame_clause 隐式定义为
The reason for the diffrence in the results is not directly the ORDER BY clause, since
a + b + c
is the same as c + b + a
. The reason is (and that is my guess) that the ORDER BY clause implicitly defines the frame_clause as
rows between unbounded preceding and current row
尝试以下查询:
select *
, sum(val) over (partition by user_id) as res
, sum(val) over (partition by user_id order by ts) as res_order_by
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and current row
) as res_order_by_unbounded_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between unbounded preceding and current row
) as res_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between current row and unbounded following
) as res_following
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and unbounded following
) as res_orderby_preceding_following
from example_table;
您将看到,无需ORDER BY子句也可以获取累计和带有ORDER BY子句的完整总和。
You will see, that you can get a cumulative sum without an ORDER BY clause aswell as get a "full" sum with the ORDER BY clause.
这篇关于在PostgreSQL中按Order By子句进行分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!