如果派生表外部过滤条件可以下推到派生表内部,可以有效减少派生表内部扫描数据量和派生表使用内存甚至避免使用派生表。
如对于下面查询:
SELECT * FROM (
SELECT
cluster_id,
COUNT(1) AS instance_count
FROM `assets_instance`
GROUP BY `cluster_id`
HAVING COUNT(1)>1
) AS T1
WHERE T1.cluster_id=1487
对应查询在执行计划为:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
partitions: NULL
type: ref
possible_keys: <auto_key0>
key: <auto_key0>
key_len: 8
ref: const
rows: 10
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 2
select_type: DERIVED
table: assets_instance
partitions: NULL
type: index
possible_keys: ForeignKey_cluster_id
key: ForeignKey_cluster_id
key_len: 8
ref: NULL
rows: 1727
filtered: 100.00
Extra: Using index
而将派生表外部的查询条件下pushdown到派生表内部,将SQL改写为:
SELECT
cluster_id,
COUNT(1) AS instance_count
FROM `assets_instance`
WHERE cluster_id=1487
GROUP BY `cluster_id`
HAVING COUNT(1)>1
执行计划为:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: assets_instance
partitions: NULL
type: ref
possible_keys: ForeignKey_cluster_id
key: ForeignKey_cluster_id
key_len: 8
ref: const
rows: 2
filtered: 100.00
Extra: Using index
优化前需要扫描instance_count
表上ForeignKey_cluster_id
索引全部数据然后进行分组计算,再按照HAVING
条件进行过滤,得到派生表数据,再根据派生表外部条件cluster_id=1487
进行过滤得到最终结果。
优化后仅需要对instance_count
表上ForeignKey_cluster_id
按照cluster_id=1487
条件进行范围查找,然后进行GROUP BY
+HAVING
计算。
如果表中存在10000个cluster_id
,那么优化后仅需要访问1/10000的数据,性能提升10000倍。
PS1: 在MariaDB中有类似优化。