问题描述
说我有一张表 order
as
id | clientid | type | amount | itemid | date
---|----------|------|--------|--------|-----------
23 | 258 | B | 150 | 14 | 2012-04-03
24 | 258 | S | 69 | 14 | 2012-04-03
25 | 301 | S | 10 | 20 | 2012-04-03
26 | 327 | B | 54 | 156 | 2012-04-04
-
clientid
是返回到client
表的外键 -
itemid
是返回项
表的外键 -
type
仅B
或S
-
amount
是整数 clientid
is a foreign-key back to theclient
tableitemid
is a foreign key back to anitem
tabletype
is onlyB
orS
amount
is an integer
和一个表处理
为
id | orderid | processed | date
---|---------|-----------|---------
41 | 23 | true | 2012-04-03
42 | 24 | true | 2012-04-03
43 | 25 | false | <NULL>
44 | 26 | true | 2012-04-05
我需要从 order 对于相同
日期的相同
具有相反的 clientid
类型
值。请记住,类型
只能具有两个值之一- B
或 S
。在上面的示例中,这将是行 23
和 24
。
I need to get all the rows from order
that for the same clientid
on the same date
have opposing type
values. Keep in mind type
can only have one of two values - B
or S
. In the example above this would be rows 23
and 24
.
另一个约束是,对于,
。已处理
中的对应行必须为 true
。 orderid
The other constraint is that the corresponding row in processed
must be true
for the orderid
.
到目前为止,我的查询
SELECT c1.clientid,
c1.date,
c1.type,
c1.itemid,
c1.amount,
c2.date,
c2.type,
c2.itemid,
c2.amount
FROM order c1
INNER JOIN order c2 ON c1.itemid = c2.itemid AND
c1.date = c2.date AND
c1.clientid = c2.clientid AND
c1.type <> c2.type AND
c1.id < c2.id
INNER JOIN processed p1 ON p1.orderid = c1.id AND
p1.processed = true
INNER JOIN processed p2 ON p2.orderid = c2.id AND
p2.processed = true
问题:保持 processed = true
作为join子句的一部分正在减慢查询速度。如果将其移至WHERE子句,则性能会更好。这引起了我的兴趣,并且 我想知道为什么 。
QUESTION: Keeping the processed = true
as part of the join clause is slowing the query down. If I move it to the WHERE clause then the performance is much better. This has piqued my interest and I'd like to know why.
主键和相应的外键列被索引,而值列(值
,已处理
等)没有索引。
The primary keys and respective foreign key columns are indexed while the value columns (value
, processed
etc) aren't.
免责声明:我继承了此数据库结构,性能差异大约为6秒。
推荐答案
之所以会看到差异,是因为计划者将执行计划放在一起,这显然取决于查询(可以说,应该将两个查询优化为相同,这可能是一个错误)。这意味着计划者认为必须以特定的方式工作才能获得每个语句中的结果。
The reason that you're seeing a difference is due to the execution plan that the planner is putting together, this is obviously different depending on the query (arguably, it should be optimising the 2 queries to be the same and this may be a bug). This means that the planner thinks it has to work in a particular way to get to the result in each statement.
当您在JOIN中进行操作时,计划者可能会必须从表中选择,按 True部分过滤,然后加入结果集。我会想象这是一个大表,因此要浏览大量数据,并且不能有效地使用索引。
When you do it within the JOIN, the planner will probably have to select from the table, filter by the "True" part, then join the result sets. I would imagine this is a large table, and therefore a lot of data to look through, and it can't use the indexes as efficiently.
我怀疑如果您在WHERE子句中执行此操作,计划者将选择一条效率更高的路由(即基于索引的数据集或预先过滤的数据集)。
I suspect that if you do it in a WHERE clause, the planner is choosing a route that is more efficient (ie. either index based, or pre filtered dataset).
通过在两列上添加索引(如果不确定Postgres尚不支持包含的列和多列索引)来以最快的速度(如果不是更快的话)进行连接工作。
You could probably make the join work as fast (if not faster) by adding an index on the two columns (not sure if included columns and multiple column indexes are supported on Postgres yet).
In简而言之,计划者面临的问题是正在选择2条不同的路线来获得结果集,其中一条路线的效率不如另一条路线。没有完整的表格信息和EXPLAIN ANALYZE信息,我们不可能知道原因是什么。
In short, the planner is the problem it is choosing 2 different routes to get to the result sets, and one of those is not as efficient as the other. It's impossible for us to know what the reasons are without the full table information and the EXPLAIN ANALYZE information.
如果您想知道为什么特定查询会这样做,请提供更多信息。但是,原因是计划者选择了不同的路线。
If you want specifics on why your specific query is doing this, you'll need to provide more information. However the reason is the planner choosing different routes.
其他阅读材料:
只是略读了一下,似乎postgres规划器并没有对连接进行重新排序以对其进行优化。尝试更改语句中的联接顺序,以查看是否达到相同的性能……只是一个想法。
Just skimmed, seems that the postgres planner doesn't re-order joins to optimise it. try changing the order of the joins in your statement to see if you then get the same performance... just a thought.
这篇关于性能差异:条件置于INNER JOIN与WHERE子句之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!