问题描述
编辑 1(澄清):感谢您到目前为止的回答!反应令人欣慰.
我想稍微澄清一下这个问题,因为根据这些答案,我认为我没有正确描述问题的一个方面(我确信这是我的错,因为我什至很难为自己定义).
问题在于:结果集应仅包含 tstamp BETWEEN '2010-01-03' AND '2010-01-09' 的记录,以及 one 记录,其中每个 order_num 的 tstamp 为 NULL在第一组中(对于每个 order_num,总是 有一个带有空 tstamp 的).
到目前为止给出的答案似乎包括某个 order_num 的 所有 记录,如果 any 带有 tstamp BETWEEN '2010-01-03' AND '2010-01-09'.例如,如果还有另一条 order_num = 2 且 tstamp = 2010-01-12 00:00:00 的记录,则它应该不包含在结果中.
原始问题:
考虑一个包含 id(唯一)、order_num、tstamp(时间戳)和 item_id(包含在订单中的单个项目)的订单表.tstamp 为空,除非订单已被修改,在这种情况下,还有另一个具有相同 order_num 和 tstamp 的记录,然后包含更改发生时的时间戳.
示例...
id order_num tstamp item_id__ _________ _________ ___0 1 1001 2 1012 2 2010-01-05 12:34:56 1023 3 1134 4 1245 5 1356 5 2010-01-07 01:23:45 1367 5 2010-01-07 02:46:00 1378 6 1009 6 2010-01-13 08:33:55 105检索在特定日期范围内已修改一次或多次的所有订单(基于 order_num)的最有效 SQL 语句是什么?换句话说,对于每个订单,我们需要所有具有相同 order_num 的记录(包括带有 NULL tstamp 的记录),对于每个 order_num,其中至少有一个 order_num 具有 tstamp NOT NULL AND tstamp BETWEEN '2010-01-03'和2010-01-09".这是我遇到困难的至少有一个 order_num 的 tstamp NOT NULL".
结果集应如下所示:
id order_num tstamp item_id__ _________ _________ ___1 2 1012 2 2010-01-05 12:34:56 1025 5 1356 5 2010-01-07 01:23:45 1367 5 2010-01-07 02:46:00 137我想出的 SQL 是这样的,本质上是A UNION (B in A)",但执行速度很慢,希望有更有效的解决方案:
选择 history_orders.order_id、history_orders.tstamp、history_orders.item_id从(SELECT orders.order_id, orders.tstamp, orders.item_idFROM 订单WHERE orders.tstamp BETWEEN '2010-01-03' 和 '2010-01-09')AS history_orders联盟选择 current_orders.order_id, current_orders.tstamp, current_orders.item_id从(SELECT orders.order_id, orders.tstamp, orders.item_idFROM 订单WHERE orders.tstamp 为空)AS current_ordersWHERE current_orders.order_id IN(选择订单.order_idFROM 订单WHERE orders.tstamp BETWEEN '2010-01-03' 和 '2010-01-09');再次感谢您的所有建议.我找到了三个有效的解决方案,包括我的原始解决方案.最后,我添加了一些性能结果,但并没有我希望的那么好.如果有人可以改进这一点,我会很高兴!
1) 目前找到的最佳解决方案似乎是:
选择 history_orders.order_id、history_orders.tstamp、history_orders.item_id从(SELECT orders.order_id, orders.tstamp, orders.item_idFROM 订单WHERE orders.tstamp '2010-01-03' 和 '2010-01-09'或 orders.tstamp 为 NULL)AS history_ordersWHERE history_orders.order_id IN(选择订单.order_idFROM 订单WHERE orders.tstamp BETWEEN '2010-01-03' 和 '2010-01-09');2) 我还尝试使用 EXISTS 代替 IN,这需要在最后一个 SELECT 中添加一个 WHERE 子句:
选择 history_orders.order_id、history_orders.tstamp、history_orders.item_id从(SELECT orders.order_id, orders.tstamp, orders.item_idFROM 订单WHERE orders.tstamp '2010-01-03' 和 '2010-01-09'或 orders.tstamp 为 NULL)AS history_orders存在的地方(选择订单.order_idFROM 订单WHERE history_orders.order_id = orders.order_idAND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');3) 最后是我的原始解决方案,使用 UNION.
评论:
为了评论表大小,我实际的现实世界"问题涉及 4 个表(与内连接连接),分别包含 98、2189、43897、785656 条记录.
性能 - 我对每个解决方案运行了 3 次,以下是我的真实世界结果:
1: 52, 51, 51 秒
2: 54, 54, 53 秒
3: 56, 56, 56 秒
Edit 1 (clarification): Thank you for the answers so far! The response is gratifying.
I want to clarify the question a little because based on the answers I think I did not describe one aspect of the problem correctly (and I'm sure that's my fault as I was having a difficult time defining it even for myself).
Here's the rub: The result set should contain ONLY the records with tstamp BETWEEN '2010-01-03' AND '2010-01-09', AND the one record where the tstamp IS NULL for each order_num in the first set (there will always be one with null tstamp for each order_num).
The answers given so far appear to include all records for a certain order_num if there are any with tstamp BETWEEN '2010-01-03' AND '2010-01-09'. For example, if there were another record with order_num = 2 and tstamp = 2010-01-12 00:00:00 it should not be included in the result.
Original question:
Consider an orders table containing id (unique), order_num, tstamp (a timestamp), and item_id (the single item included in an order). tstamp is null, unless the order has been modified, in which case there is another record with identical order_num and tstamp then contains the timestamp of when the change occurred.
Example...
id order_num tstamp item_id __ _________ ___________________ _______ 0 1 100 1 2 101 2 2 2010-01-05 12:34:56 102 3 3 113 4 4 124 5 5 135 6 5 2010-01-07 01:23:45 136 7 5 2010-01-07 02:46:00 137 8 6 100 9 6 2010-01-13 08:33:55 105
What is the most efficient SQL statement to retrieve all of the orders (based on order_num) which have been modified one or more times during a certain date range? In other words, for each order we need all of the records with the same order_num (including the one with NULL tstamp), for each order_num WHERE at least one of the order_num's has tstamp NOT NULL AND tstamp BETWEEN '2010-01-03' AND '2010-01-09'. It's the "WHERE at least one of the order_num's has tstamp NOT NULL" that I'm having difficulty with.
The result set should look like this:
id order_num tstamp item_id __ _________ ___________________ _______ 1 2 101 2 2 2010-01-05 12:34:56 102 5 5 135 6 5 2010-01-07 01:23:45 136 7 5 2010-01-07 02:46:00 137
The SQL that I came up with is this, which is essentially "A UNION (B in A)", but it executes slowly and I hope there is a more efficient solution:
SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09') AS history_orders UNION SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp IS NULL) AS current_orders WHERE current_orders.order_id IN (SELECT orders.order_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');
Thank you again for all the suggestions. I found three solutions that work, including my original. At the end I've added some performance results, which are not as great as I had hoped. If anyone can improve on this I would be thrilled!
1) The best solution found so far seems to be:
SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09' OR orders.tstamp IS NULL) AS history_orders WHERE history_orders.order_id IN (SELECT orders.order_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');
2) I also tried using EXISTS in place of IN, which requires an additional WHERE clause in the last SELECT:
SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id FROM (SELECT orders.order_id, orders.tstamp, orders.item_id FROM orders WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09' OR orders.tstamp IS NULL) AS history_orders WHERE EXISTS (SELECT orders.order_id FROM orders WHERE history_orders.order_id = orders.order_id AND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');
3) And finally there is my original solution, using UNION.
Comments:
To comment on the table size, my actual "real world" problem involves 4 tables (connected with inner joins) containing 98, 2189, 43897, 785656 records respectively.
Performance - I ran each solution three times and here are my real world results:
1: 52, 51, 51 seconds
2: 54, 54, 53 s
3: 56, 56, 56 s
这篇关于比使用“A UNION (B in A)"更高效的 SQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!