问题描述
我有这样的数据:
table1
_____________
id way time
1 1 00:01
2 1 00:02
3 2 00:03
4 2 00:04
5 2 00:05
6 3 00:06
7 3 00:07
8 1 00:08
9 1 00:09
我想知道在哪个时间间隔上以哪种方式:
I would like to know in which time interval I was on which way:
desired output
_________________
id way from to
1 1 00:01 00:02
3 2 00:03 00:05
6 3 00:06 00:07
8 1 00:08 00:09
我尝试使用窗口函数:
SELECT DISTINCT
first_value(id) OVER w AS id,
first_value(way) OVER w as way,
first_value(time) OVER w as from,
last_value(time) OVER w as to
FROM table1
WINDOW w AS (
PARTITION BY way ORDER BY ID
range between unbounded preceding and unbounded following);
我得到的是:
ID way from to
1 1 00:01 00:09
3 2 00:03 00:05
6 3 00:06 00:07
这是不正确的,因为在方式1上我不是从00:01到00:09。
是否有可能按照顺序进行分区,这意味着仅对以下相等的属性进行分组?
And this is not correct, because on way 1 I wasn't from 00:01 to 00:09.Is there a possibility to do the partition according to the order, means grouping only following attributes, that are equal?
推荐答案
如果您的案例像示例值所建议的那样简单,那么效果很好。
If your case is as simple as the example values suggest, @Giorgos' answer serves nicely.
但是,通常不是这种情况。如果 id
列是 serial
,则不能依赖以下假设:行中的<$ c $较早c> time 的 id
也较小。
另外, time
值(或 timestamp
可能很容易)是重复的,您需要使排序顺序明确。
However, that's typically not the case. If the id
column is a serial
, you cannot rely on the assumption that a row with an earlier time
also has a smaller id
.
Also, time
values (or timestamp
like you probably have) can easily be duplicates, you need to make the sort order unambiguous.
假设两者都可能发生,并且您希望每个时间片中 id
的行中最早的 id
(实际上,最小 id
最早的时间,可能会有联系),此查询将处理这种情况正确地:
Assuming both can happen, and you want the id
from the row with the earliest time
per time slice (actually, the smallest id
for the earliest time, there could be ties), this query would deal with the situation properly:
SELECT *
FROM (
SELECT DISTINCT ON (way, grp)
id, way, time AS time_from
, max(time) OVER (PARTITION BY way, grp) AS time_to
FROM (
SELECT *
, row_number() OVER (ORDER BY time, id) -- id as tie breaker
- row_number() OVER (PARTITION BY way ORDER BY time, id) AS grp
FROM table1
) t
ORDER BY way, grp, time, id
) sub
ORDER BY time_from, id;
-
按时间排序,id
是明确的。假设时间不是唯一的,请添加(假定唯一的)id
以避免任意结果-可能会在不同查询之间更改。ORDER BY time, id
to be unambiguous. Assuming time is not unique, add the (assumed unique)id
to avoid arbitrary results - that could change between queries in sneaky ways.max(time)OVER(按方式分配,grp)
:不带ORDER BY
,窗口框架跨越PARTITION的所有行,因此我们获得每个时间片的绝对最大值。max(time) OVER (PARTITION BY way, grp)
: withoutORDER BY
, the window frame spans all rows of the PARTITION, so we get the absolute maximum per time slice.外部查询层仅需要在结果中产生所需的排序顺序,因为我们在子查询
sub $ c $中绑定了另一个
ORDER BY
c>使用DISTINCT ON
。详细信息:The outer query layer is only necessary to produce the desired sort order in the result, since we are bound to a different
ORDER BY
in the subquerysub
by usingDISTINCT ON
. Details:
- Select first row in each GROUP BY group?
- Group by repeating attribute
此外:请勿使用基本类型名称
time
作为标识符(也是)。Aside: don't use the basic type name
time
as identifier (also a reserved word in standard SQL).这篇关于如何按非唯一值对后面的行进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!