本文介绍了在PostgreSQL中按Order By子句进行分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含这些值的表;

I have a table with these values;

user_id ts                  val
uid1    19.05.2019 01:49:50  0
uid1    19.05.2019 01:50:15  0
uid1    19.05.2019 01:50:20  0
uid1    19.05.2019 01:59:50  1
uid1    19.05.2019 02:20:10  1
uid1    19.05.2019 02:20:15  0
uid1    19.05.2019 02:20:19  0
uid1    19.05.2019 02:30:53  1
uid1    19.05.2019 11:10:25  1
uid1    19.05.2019 11:13:40  0
uid1    19.05.2019 11:13:50  0
uid1    19.05.2019 11:20:19  1
uid2    19.05.2019 15:01:44  0
uid2    19.05.2019 15:05:55  0
uid2    19.05.2019 17:19:35  1
uid2    19.05.2019 17:20:01  0
uid2    19.05.2019 17:20:35  0
uid2    19.05.2019 19:15:50  1

当我只查询带有by by子句的表时,结果看起来像这样;

When I query this table with only partition by clause, result seems like this;

查询:选择*,将sum(val)超过(由user_id划分)作为example_table的res;

user_id ts                  val res
uid1    19.05.2019 01:49:50  0  5
uid1    19.05.2019 01:50:15  0  5
uid1    19.05.2019 01:50:20  0  5
uid1    19.05.2019 01:59:50  1  5
uid1    19.05.2019 02:20:10  1  5
uid1    19.05.2019 02:20:15  0  5
uid1    19.05.2019 02:20:19  0  5
uid1    19.05.2019 02:30:53  1  5
uid1    19.05.2019 11:10:25  1  5
uid1    19.05.2019 11:13:40  0  5
uid1    19.05.2019 11:13:50  0  5
uid1    19.05.2019 11:20:19  1  5
uid2    19.05.2019 15:01:44  0  2
uid2    19.05.2019 15:05:55  0  2
uid2    19.05.2019 17:19:35  1  2
uid2    19.05.2019 17:20:01  0  2
uid2    19.05.2019 17:20:35  0  2
uid2    19.05.2019 19:15:50  1  2

在以上结果中, res 列具有每个分区的 val 列的总和值。但是,如果我用分区和顺序查询表,就会得到这些结果;

In the above results, res column has total sum value of the val column for each partition. But, If I'll query table with partition by and order by, I'm getting these results;

查询: select *,sum( val)over(以ts的user_id顺序划分)作为example_table的res;

user_id ts                  val res
uid1    19.05.2019 01:49:50  0  0
uid1    19.05.2019 01:50:15  0  0
uid1    19.05.2019 01:50:20  0  0
uid1    19.05.2019 01:59:50  1  1
uid1    19.05.2019 02:20:10  1  2
uid1    19.05.2019 02:20:15  0  2
uid1    19.05.2019 02:20:19  0  2
uid1    19.05.2019 02:30:53  1  3
uid1    19.05.2019 11:10:25  1  4
uid1    19.05.2019 11:13:40  0  4
uid1    19.05.2019 11:13:50  0  4
uid1    19.05.2019 11:20:19  1  5
uid2    19.05.2019 15:01:44  0  0
uid2    19.05.2019 15:05:55  0  0
uid2    19.05.2019 17:19:35  1  1
uid2    19.05.2019 17:20:01  0  1
uid2    19.05.2019 17:20:35  0  1
uid2    19.05.2019 19:15:50  1  2

但是使用order by子句, res 列具有每个分区的每一行的 value 列的累积总和。

But with order by clause, res column has the cumulative sum of the value column for each row for each partition.

为什么?我听不懂

推荐答案

更新



此行为已记录:

这意味着:

在没有 frame_clause的情况下 –默认使用 RANGE UNBOUNDED PRECEDING 。其中包括:

In absence of a frame_clauseRANGE UNBOUNDED PRECEDING is used by default. That includes:


  • 所有行都根据 ORDER BY 子句

  • 当前行

  • ORDER BY 列作为当前行

  • All rows "preceding" the current row according to the ORDER BY clause
  • The current row
  • All rows which have the same values in the ORDER BY columns as the current row

在没有 ORDER BY 子句的情况下–假定 ORDER BY NULL (尽管我又在猜测)。因此,框架将包括分区中的所有行,因为 ORDER BY 列中的值是

In absence of an ORDER BY clause – ORDER BY NULL is assumed (though I'm guessing again). Thus the frame will include all rows from the partition, because the values in the ORDER BY column(s) are the same (which is always NULL) in every row.

免责声明:以下内容是猜测,而不是合格的答案。我没有找到任何可以证实我所写内容的文档。同时,我认为当前给出的答案不能正确解释该行为。

Disclaimer: The following is more a guess than a qualified answer. I didn't find any documentation, which can confirm what I write. At the same time I don't think that currently given answers correctly explain the behavior.

结果差异的原因不是直接的ORDER BY子句,因为 a + b + c c + b + a 相同。原因是(这是我的猜测)ORDER BY子句将 frame_clause 隐式定义为

The reason for the diffrence in the results is not directly the ORDER BY clause, since a + b + c is the same as c + b + a. The reason is (and that is my guess) that the ORDER BY clause implicitly defines the frame_clause as

rows between unbounded preceding and current row

尝试以下查询:

select *
, sum(val) over (partition by user_id) as res
, sum(val) over (partition by user_id order by ts) as res_order_by
, sum(val) over (
    partition by user_id
    order by ts
    rows between unbounded preceding and current row
  ) as res_order_by_unbounded_preceding
, sum(val) over (
    partition by user_id
    -- order by ts
    rows between unbounded preceding and current row
  ) as res_preceding
, sum(val) over (
    partition by user_id
    -- order by ts
    rows between current row and unbounded following
  ) as res_following
, sum(val) over (
    partition by user_id
    order by ts
    rows between unbounded preceding and unbounded following
  ) as res_orderby_preceding_following

from example_table;

您将看到,无需ORDER BY子句也可以获取累计和带有ORDER BY子句的完整总和。

You will see, that you can get a cumulative sum without an ORDER BY clause aswell as get a "full" sum with the ORDER BY clause.

这篇关于在PostgreSQL中按Order By子句进行分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-26 08:16
查看更多