问题描述
假设我有一个包含两列的表:start
和end
,都是整数,并且该表由第一列然后是第二列排序.每行代表一个间隔.
Let's say I have a table with two columns: start
and end
, both integers, and the table is ordered by the first, then second column. Each row represents an interval.
我需要的是合并的间隔表:所有重叠或相邻的间隔都变成一个.
What I need is the table of merged intervals: all overlapping or adjacent intervals gobbled up into one.
它可以用JOIN查询构造,但是行数是二次的,在我的情况下是400万行(我决定组成这个问题,因为查询仍在运行).
It can be constructed with a JOIN query, but that is quadratic in the number of rows, which is 4 million rows in my case (I decided to compose this question because the query is still running).
也可以通过遍历每一行并跟踪最大结束时间以单次的方式完成-但是如何在标准SQL中执行此操作或进行等效操作?在SQL中有任何 O(n)方法可以做到吗?我现在正在使用SQLite.特定于SQLite的解决方案这次也对我有帮助.
It can also be done in a single pass, by running through each row and keeping track of the maximum end time - but how to do that, or something equivalent, in standard SQL? Is there any O(n) way to do it in SQL? I'm using SQLite right now; a SQLite-specific solution would also help me out this time.
从相关问题的答案( 1 ,2 ,, 4 , 5 ,6 ,,, 9 )我不知道是否有可能.
From the answers to related questions (1, 2, 3, 4, 5, 6, 7, 8, 9) I can't tell whether it's possible.
可以吗?
推荐答案
好,这是在MySQL中有效的解决方案(我不知道它是否在SQlite中有效).我认为但不能证明那是O(n)(丢弃最初对事件表进行排序所花费的时间,即,是否已经按照我认为的问题进行了排序.)
Well, here is a solution that works in MySQL (I don't know if it will work in SQlite). I think, but cannot prove, that is O(n) (discarding the time it takes to sort the events table initially, i.e. if it is already sorted as I think the question states.)
> SELECT * from events;
+-------+-----+
| start | end |
+-------+-----+
| 1 | 9 |
| 5 | 8 |
| 8 | 11 |
| 11 | 13 |
| 17 | 25 |
| 18 | 26 |
| 33 | 42 |
| 59 | 81 |
| 61 | 87 |
| 97 | 132 |
| 105 | 191 |
| 107 | 240 |
| 198 | 213 |
| 202 | 215 |
+-------+-----+
14 rows in set (0.00 sec)
SET @interval_id = 0;
SET @interval_end = 0;
SELECT
MIN(start) AS start,
MAX(end) AS end
FROM
(SELECT
@interval_id := IF(start > @interval_end,
@interval_id + 1,
@interval_id) AS interval_id,
@interval_end := IF(start < @interval_end,
GREATEST(@interval_end, end),
end) AS interval_end,
events.*
FROM events
ORDER BY start,end) tmp
GROUP BY interval_id;
+-------+------+
| start | end |
+-------+------+
| 1 | 13 |
| 17 | 26 |
| 33 | 42 |
| 59 | 87 |
| 97 | 240 |
+-------+------+
5 rows in set (0.00 sec)
这篇关于在SQL中一遍合并间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!