复现MySQL的索引选择失误以及通过OPTIMIZER_TRACE分析过程
验证环境:MySQL 5.7.39 windows-pc
一、构造数据(生成150万数据)
构建一张账户表,带有一级部门id和二级部门id,并且建立有索引。比较典型的业务场景,根据部门id进行各类查询。
CREATE TABLE `TM_ACCOUNT` (
`account_id` bigint(20) NOT null ,
`name` varchar(32) DEFAULT '',
`address` varchar(32) DEFAULT '',
`org_first_id` int(10) DEFAULT 0,
`org_second_id` int(10) DEFAULT 0,
`biz_date` date DEFAULT null,
`last_modify_dt` datetime DEFAULT null,
PRIMARY KEY (`account_id`),
KEY IDX_org_id_combine(org_first_id,org_second_id),
KEY IDX_last_modify_dt_org_first_id_name(last_modify_dt,org_first_id,org_second_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
1. 构造数据
此处直接通过jdbc批量插入数据。
数据分布,保证数据无倾斜,索引数据均匀:
- org_first_id和org_second_id字段都是在1-100间随机分布
- last_modify_dt在25天间随机分布
代码可以直接使用,详情见附件3
二、通过explain验证语句的索引使用
查看表的基本情况
show index from TM_ACCOUNT ; -- 看索引
执行结果,可以看到org_first_id/org_second_id的区分度,都很不错。
show table status like '%TM_ACCOUNT%'; -- 看表状态,有数据大小、索引大小、大概行数
可看到使用了InnoDB引擎,大概行数是1408599,实际行数是1500000整。
常规的查询
explain SELECT * from TM_ACCOUNT where ACCOUNT_ID = '10'; -- 典型的主键字段查询,非常快,type=const
explain SELECT * from TM_ACCOUNT where ACCOUNT_ID = '10'; -- 典型的主键字段查询,非常快,type=const
explain SELECT * from TM_ACCOUNT where NAME = 'name-11'; -- 典型的非索引字段查询,全表扫描
explain SELECT * from TM_ACCOUNT where ADDRESS = 'QR3xHEOpaLAVNFCtAKXY'; -- 典型的非索引字段查询,全表扫描
explain SELECT * from TM_ACCOUNT where LAST_MODIFY_DT = '2100-09-13 00:00:00' and ACCOUNT_ID > 100 LIMIT 2; -- 典型的范围查询,扫描索引。单速度也很快
通过改变查询条件,引导MySQL优化器,选择错误的索引、规则
下面通过3个SQL查询的结果对比,来复现MySQL优化器如何选错优化场景。(这里不讨论为何不换种写法,直接规避劣化SQL。往往出现这类SQL时,一是业务场景复杂,二是开发时数据量少并未发现,在生产环境才能出现)
-- SQL-1
explain
SELECT * from TM_ACCOUNT where org_first_id >= 99 and org_second_id in (1,2,3,60) and BIZ_DATE in ('2100-09-01','2100-09-02')and ACCOUNT_ID > '120306' order by ACCOUNT_ID desc LIMIT 5000;
查询结果:可见使用了IDX_org_id_combine索引,并用到索引范围扫描、回表查询、临时文件排序。不算是一个很好的查询语句,但实际业务中的查询条件,只会更复杂。直接查询耗时140ms。
-- SQL-2 坏案例-全表扫描;
explain
SELECT * from TM_ACCOUNT where org_first_id >= 90 and org_second_id in (1,2,3,60) and BIZ_DATE in ('2100-09-01','2100-09-02') and ACCOUNT_ID > '120306' order by ACCOUNT_ID desc LIMIT 5000;
查询结果:改变org_first_id条件,扩大查询范围,结果变成了主键索引的大范围扫描,预估扫描行数70万行,几乎是表总数的一半。直接查询耗时3900ms。
-- SQL-3 与SQL-1基本相同,但limit数量减少。
explain
SELECT * from TM_ACCOUNT where org_first_id >= 99 and org_second_id in (1,2,3,60) and BIZ_DATE in ('2100-09-01','2100-09-02') and ACCOUNT_ID > '120306' order by ACCOUNT_ID desc LIMIT 500;
查询结果:与SQL-1基本相同,但limit数量减少,即查询条件范围缩小,劣化成主键大范围扫描。 直接查询耗时1210ms。
三、复现索引选择劣化、并尝试分析OPTIMIZER_TRACE
执行相关命令,获取OPTIMIZER_TRACE过程。
/* 打开optimizer_trace,只对本线程有效 */
SET optimizer_trace='enabled=on';
#你的sql
-- select ......;
SELECT * from TM_ACCOUNT where org_first_id >= 90 and org_second_id in (1,2,3,60) and BIZ_DATE in ('2100-09-01','2100-09-02') and ACCOUNT_ID > '120306' order by ACCOUNT_ID desc LIMIT 5000;
#查看优化器追踪链
select * from INFORMATION_SCHEMA.OPTIMIZER_TRACE;
#关闭优化器追踪
SET optimizer_trace='enabled=off';
关键过程:通过对潜在查询方式的预估,分别对PRIMARY/IDX_org_id_combine的开销进行评估,这里开销并不仅看扫描行数,还会看排序等情况。可以看到虽然走主键索引的行数更多,但总开销更小。由此可知在【预估】过程,误导了整个优化器。
共有2个潜在选项,分别标出了rowid是否排序、行数rows、预估开销cost
- PRIMARY,范围是"120306 < account_id"
- IDX_org_id_combine,范围是"90 <= org_first_id"
截取部分OPTIMIZER_TRACE结果,完整json参考附录1
// 分析可供选择的范围条件
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "PRIMARY",
"ranges": [
"120306 < account_id"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": false,
"rows": 704299,
"cost": 141880,
"chosen": true
},
{
"index": "IDX_org_id_combine",
"ranges": [
"90 <= org_first_id"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 295138,
"cost": 354167,
"chosen": false,
"cause": "cost"
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
},
// 最终选择的路径
"chosen_range_access_summary": {
"range_access_plan": {
"type": "range_scan",
"index": "PRIMARY",
"rows": 704299,
"ranges": [
"120306 < account_id"
]
},
"rows_for_plan": 704299,
"cost_for_plan": 141880,
"chosen": true
}
这里怀疑是order by ACCOUNT_ID
影响了优化器选择,但通测试发现,即使移除了'order by ACCOUNT_ID desc LIMIT 5000',explain结果仍然是走PRIMARY索引。由此可见,还有些隐藏的信息,OPTIMIZER_TRACE没有展示全。这里暂不深入讨论。
explain
SELECT * from TM_ACCOUNT where org_first_id >= 90 and org_second_id in (1,2,3,60) and BIZ_DATE in ('2100-09-01','2100-09-02') and ACCOUNT_ID > '120306' ;
结果:实际查询耗时912ms。在【## 附录2 OPTIMIZER_TRACE原始信息2】中也能看到选择实际索引,仍然是PRIMARY,与explain结果一致。
四、如何优化?
改写SQL:
- 通过配置、distinct org_first_id等方式,将org_first_id的范围固定下来,并缓存
- 改写SQL,将
org_first_id >= 90
改写为org_first_id IN (xxxxx)
下面来看效果
explain
SELECT * from TM_ACCOUNT where org_first_id in ('90','91','92','93','94','95','96','97','98','99') and org_second_id in (1,2,3,60) and BIZ_DATE in ('2100-09-01','2100-09-02') and ACCOUNT_ID > '120306' order by ACCOUNT_ID desc LIMIT 5000;
结果:实际查询耗时59ms。explain结果可看到虽然也用了IDX_org_id_combine索引,但仍然是range查询、回表、filesort,好在扫描行数较少,最终耗时很小。
思考,改写SQL是最佳解决方案吗?
随着数据量的增大,无论多么简单的SQL,最终仍然会变慢。
其他方式:
- 数据归档。 建立历史表、大数据抽数归档冷数据。
- 引入专门的OLAP系统,不在OLTP系统做复杂的业务查询。引入ES、hive、HBASE等组件,专业的事交给专业的人去做。
其他
- 打开optimizer_trace,只对本线程有效。建议使用命令行窗口,直连db。通过Navicat等客户端,可能会记录失败。
- 一般optimizer_trace只在root用户下才能使用
- mariadb直到10.4版本才有Optimizer Trace, 之前的版本执行'SET optimizer_trace='enabled=on'; '会返回错误 。官网链接https://mariadb.com/resources/blog/optimizer-trace-in-mariadb-server-10-4/
附录1 OPTIMIZER_TRACE原始信息1
以下语句的执行优化过程SELECT * from TM_ACCOUNT where org_first_id >= 90 and org_second_id in (1,2,3,60) and BIZ_DATE in ('2100-09-01','2100-09-02') and ACCOUNT_ID > '120306' order by ACCOUNT_ID desc LIMIT 5000;
{
"steps": [
{
"join_preparation": {
"select#": 1,
"steps": [
{
"IN_uses_bisection": true
},
{
"IN_uses_bisection": true
},
{
"expanded_query": "/* select#1 */ select `tm_account`.`account_id` AS `account_id`,`tm_account`.`name` AS `name`,`tm_account`.`address` AS `address`,`tm_account`.`org_first_id` AS `org_first_id`,`tm_account`.`org_second_id` AS `org_second_id`,`tm_account`.`biz_date` AS `biz_date`,`tm_account`.`last_modify_dt` AS `last_modify_dt` from `tm_account` where ((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306)) order by `tm_account`.`account_id` desc limit 5000"
}
]
}
},
{
"join_optimization": {
"select#": 1,
"steps": [
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
},
{
"transformation": "constant_propagation",
"resulting_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
},
{
"transformation": "trivial_condition_removal",
"resulting_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
}
]
}
},
{
"substitute_generated_columns": {}
},
{
"table_dependencies": [
{
"table": "`tm_account`",
"row_may_be_null": false,
"map_bit": 0,
"depends_on_map_bits": []
}
]
},
{
"ref_optimizer_key_uses": []
},
{
// 行数预估
"rows_estimation": [
{
"table": "`tm_account`",
"range_analysis": {
"table_scan": {
"rows": 1408599,
"cost": 288932
},
"potential_range_indexes": [
{
"index": "PRIMARY",
"usable": true,
"key_parts": [
"account_id"
]
},
{
"index": "IDX_org_id_combine",
"usable": true,
"key_parts": [
"org_first_id",
"org_second_id",
"account_id"
]
},
{
"index": "IDX_last_modify_dt_org_first_id_name",
"usable": false,
"cause": "not_applicable" // 直接标明不适用
}
],
"setup_range_conditions": [],
"group_index_range": {
"chosen": false,
"cause": "not_group_by_or_distinct"
},
// 分析可供选择的范围条件
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "PRIMARY",
"ranges": [
"120306 < account_id"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": false,
"rows": 704299,
"cost": 141880,
"chosen": true
},
{
"index": "IDX_org_id_combine",
"ranges": [
"90 <= org_first_id"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 295138,
"cost": 354167,
"chosen": false,
"cause": "cost"
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
},
"chosen_range_access_summary": {
"range_access_plan": {
"type": "range_scan",
"index": "PRIMARY",
"rows": 704299,
"ranges": [
"120306 < account_id"
]
},
"rows_for_plan": 704299,
"cost_for_plan": 141880,
"chosen": true
}
}
}
]
},
{
"considered_execution_plans": [
{
"plan_prefix": [],
"table": "`tm_account`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 704299,
"access_type": "range",
"range_details": {
"used_index": "PRIMARY"
},
"resulting_rows": 11806,
"cost": 282740,
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 11806,
"cost_for_plan": 282740,
"chosen": true
}
]
},
{
"attaching_conditions_to_tables": {
"original_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))",
"attached_conditions_computation": [
{
"table": "`tm_account`",
"rechecking_index_usage": {
"recheck_reason": "low_limit",
"limit": 5000,
"row_estimate": 11806,
"range_analysis": {
"table_scan": {
"rows": 1408599,
"cost": 1690000
},
"potential_range_indexes": [
{
"index": "PRIMARY",
"usable": true,
"key_parts": [
"account_id"
]
},
{
"index": "IDX_org_id_combine",
"usable": false,
"cause": "not_applicable"
},
{
"index": "IDX_last_modify_dt_org_first_id_name",
"usable": false,
"cause": "not_applicable"
}
],
"setup_range_conditions": [],
"group_index_range": {
"chosen": false,
"cause": "cannot_do_reverse_ordering"
},
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "PRIMARY",
"ranges": [
"120306 < account_id"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": false,
"rows": 704299,
"cost": 141880,
"chosen": true
}
]
},
"chosen_range_access_summary": {
"range_access_plan": {
"type": "range_scan",
"index": "PRIMARY",
"rows": 704299,
"ranges": [
"120306 < account_id"
]
},
"rows_for_plan": 704299,
"cost_for_plan": 141880,
"chosen": true
}
}
}
}
],
"attached_conditions_summary": [
{
"table": "`tm_account`",
"attached": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
}
]
}
},
{
"clause_processing": {
"clause": "ORDER BY",
"original_clause": "`tm_account`.`account_id` desc",
"items": [
{
"item": "`tm_account`.`account_id`"
}
],
"resulting_clause_is_simple": true,
"resulting_clause": "`tm_account`.`account_id` desc"
}
},
{
"reconsidering_access_paths_for_index_ordering": {
"clause": "ORDER BY",
"steps": [],
"index_order_summary": {
"table": "`tm_account`",
"index_provides_order": true,
"order_direction": "desc",
"index": "PRIMARY",
"plan_changed": false
}
}
},
{
"refine_plan": [
{
"table": "`tm_account`"
}
]
}
]
}
},
{
"join_execution": {
"select#": 1,
"steps": []
}
}
]
}
附录2 OPTIMIZER_TRACE原始信息2
SELECT * from TM_ACCOUNT where org_first_id >= 90 and org_second_id in (1,2,3,60) and BIZ_DATE in ('2100-09-01','2100-09-02') and ACCOUNT_ID > '120306' ;
{
"steps": [
{
"join_preparation": {
"select#": 1,
"steps": [
{
"IN_uses_bisection": true
},
{
"IN_uses_bisection": true
},
{
"expanded_query": "/* select#1 */ select `tm_account`.`account_id` AS `account_id`,`tm_account`.`name` AS `name`,`tm_account`.`address` AS `address`,`tm_account`.`org_first_id` AS `org_first_id`,`tm_account`.`org_second_id` AS `org_second_id`,`tm_account`.`biz_date` AS `biz_date`,`tm_account`.`last_modify_dt` AS `last_modify_dt` from `tm_account` where ((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
}
]
}
},
{
"join_optimization": {
"select#": 1,
"steps": [
{
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))",
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
},
{
"transformation": "constant_propagation",
"resulting_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
},
{
"transformation": "trivial_condition_removal",
"resulting_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
}
]
}
},
{
"substitute_generated_columns": {}
},
{
"table_dependencies": [
{
"table": "`tm_account`",
"row_may_be_null": false,
"map_bit": 0,
"depends_on_map_bits": []
}
]
},
{
"ref_optimizer_key_uses": []
},
{
"rows_estimation": [
{
"table": "`tm_account`",
"range_analysis": {
"table_scan": {
"rows": 1408599,
"cost": 288932
},
"potential_range_indexes": [
{
"index": "PRIMARY",
"usable": true,
"key_parts": [
"account_id"
]
},
{
"index": "IDX_org_id_combine",
"usable": true,
"key_parts": [
"org_first_id",
"org_second_id",
"account_id"
]
},
{
"index": "IDX_last_modify_dt_org_first_id_name",
"usable": false,
"cause": "not_applicable"
}
],
"setup_range_conditions": [],
"group_index_range": {
"chosen": false,
"cause": "not_group_by_or_distinct"
},
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "PRIMARY",
"ranges": [
"120306 < account_id"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": true,
"using_mrr": false,
"index_only": false,
"rows": 704299,
"cost": 141880,
"chosen": true
},
{
"index": "IDX_org_id_combine",
"ranges": [
"90 <= org_first_id"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 295138,
"cost": 354167,
"chosen": false,
"cause": "cost"
}
],
"analyzing_roworder_intersect": {
"usable": false,
"cause": "too_few_roworder_scans"
}
},
"chosen_range_access_summary": {
"range_access_plan": {
"type": "range_scan",
"index": "PRIMARY",
"rows": 704299,
"ranges": [
"120306 < account_id"
]
},
"rows_for_plan": 704299,
"cost_for_plan": 141880,
"chosen": true
}
}
}
]
},
{
"considered_execution_plans": [
{
"plan_prefix": [],
"table": "`tm_account`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 704299,
"access_type": "range",
"range_details": {
"used_index": "PRIMARY"
},
"resulting_rows": 704299,
"cost": 282740,
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 704299,
"cost_for_plan": 282740,
"chosen": true
}
]
},
{
"attaching_conditions_to_tables": {
"original_condition": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))",
"attached_conditions_computation": [],
"attached_conditions_summary": [
{
"table": "`tm_account`",
"attached": "((`tm_account`.`org_first_id` >= 90) and (`tm_account`.`org_second_id` in (1,2,3,60)) and (`tm_account`.`biz_date` in ('2100-09-01','2100-09-02')) and (`tm_account`.`account_id` > 120306))"
}
]
}
},
{
"refine_plan": [
{
"table": "`tm_account`"
}
]
}
]
}
},
{
"join_execution": {
"select#": 1,
"steps": []
}
}
]
}
附录3 java构造数据
public final class JdbcUtils {
private static String url = "jdbc:mysql://localhost:3306/xxxx?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&serverTimezone=GMT%2B8";
private static String user = "root";
private static String password = "123";
private JdbcUtils() {
}
static {
try {
Class.forName("com.mysql.jdbc.Driver");
} catch (ClassNotFoundException e) {
throw new ExceptionInInitializerError(e);
}
}
public static void main(String args[]) {
insertBatch();
}
public static void insertBatch() {
Connection conn = null;
PreparedStatement pst = null;
ResultSet rs = null;
try {
String sql = "INSERT into TM_ACCOUNT values(?,?,?,?,?,?,?);";
// 1. 获取链接,预处理语句
conn = getConnection();
conn.setAutoCommit(false);
pst = conn.prepareStatement(sql);
// 2. 开始插入,总插入150万
Random random = new Random();
int a_id_start = 1;
for (int i = 0; i < 5 * 150; i++) {
// 每2000条执行一次批量插入
for (int loop = 0; loop < 2000; loop++) {
a_id_start++;
pst.setInt(1, a_id_start);
pst.setString(2, "name-" + a_id_start);
pst.setString(3, RandomString.make(20));
pst.setInt(4, random.nextInt(100));
pst.setInt(5, random.nextInt(100));
pst.setDate(6, new Date(200, 8, random.nextInt(25) + 1));
pst.setDate(7, new Date(200, 8, random.nextInt(25) + 1));
pst.addBatch();
}
pst.executeBatch();
conn.commit();
System.out.println(" done !!!!!!" + i);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
free(rs, pst, conn);
}
}
public static Connection getConnection() throws SQLException {
return DriverManager.getConnection(url, user, password);
}
public static void free(ResultSet rs, Statement st, Connection conn) {
try {
if (rs != null)
rs.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (st != null)
st.close();
} catch (Exception e2) {
e2.printStackTrace();
} finally {
try {
conn.close();
} catch (Exception e3) {
e3.printStackTrace();
}
}
}
}
}