问题描述
想象一下以下表格:
创建表格框(id int,名称文本,...);
create table boxes( id int, name text, ...);
创建表Thingsinboxes(id int,box_id int,事物枚举('apple,'banana','orange');
create table thingsinboxes( id int, box_id int, thing enum('apple,'banana','orange');
表格如下:
Boxes:
id | name
1 | orangesOnly
2 | orangesOnly2
3 | orangesBananas
4 | misc
thingsinboxes:
id | box_id | thing
1 | 1 | orange
2 | 1 | orange
3 | 2 | orange
4 | 3 | orange
5 | 3 | banana
6 | 4 | orange
7 | 4 | apple
8 | 4 | banana
如何选择至少包含一个橙色并且没有一个不是橙色的框?
How do I select the boxes that contain at least one orange and nothing that isn't an orange?
假设我有几十万个盒子,可能还有一百万个盒子,那么这个比例如何变化?
How does this scale, assuming I have several hundred thousand boxes and possibly a million things in boxes?
如果可能的话,我希望将其全部保留在SQL中,而不是使用脚本对结果集进行后处理.
I'd like to keep this all in SQL if possible, rather than post-processing the result set with a script.
我同时使用了postgres和mysql,因此子查询可能很糟糕,因为mysql不会优化子查询(无论如何,都是6版之前的版本).
I'm using both postgres and mysql, so subqueries are probably bad, given that mysql doesn't optimize subqueries (pre version 6, anyway).
推荐答案
SELECT b.*
FROM boxes b JOIN thingsinboxes t ON (b.id = t.box_id)
GROUP BY b.id
HAVING COUNT(DISTINCT t.thing) = 1 AND SUM(t.thing = 'orange') > 0;
这是另一种不使用GROUP BY的解决方案:
Here's another solution that does not use GROUP BY:
SELECT DISTINCT b.*
FROM boxes b
JOIN thingsinboxes t1
ON (b.id = t1.box_id AND t1.thing = 'orange')
LEFT OUTER JOIN thingsinboxes t2
ON (b.id = t2.box_id AND t2.thing != 'orange')
WHERE t2.box_id IS NULL;
一如既往,在得出关于查询的可伸缩性或性能的结论之前,您必须使用现实的数据集进行尝试,并测量性能.
As always, before you make conclusions about the scalability or performance of a query, you have to try it with a realistic data set, and measure the performance.
这篇关于通过要求许多满足条件来过滤一对多查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!