水稻:最近有个朋友生产环境出现MySQL死锁问题,一听是死锁,那必须去看看啊,于是饶(si)有(qu)兴(huo)致(lai)的研究了好几天
菜瓜:MySQL死锁,赶紧分享一下
水稻:能否先让我装完X,我从朋友那里拿到数据结构,复现,分析,查资料,总。。。
菜瓜:今天的菜真香
水稻:。。。好吧,进入正题(数据已加工处理)
- 一开始朋友拿出了死锁日志中记录的两条SQL语句(暂且把第一条SQL叫SQL1,第二条SQL2)
-- 两句SQL结构一致,只是参数不一样。
explain update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35610745 and `from` = '1' and `to` = 'c' ; explain update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35611183 and `from` = '1' and `to` = '2' ;
- 然后是表结构和数据 (这里只录入模拟数据,数据量并不是这次分析的关键原因)
CREATE TABLE `deadlock` ( `id` int(11) NOT NULL AUTO_INCREMENT, `from` varchar(64) CHARACTER SET utf8mb4 DEFAULT '', `to` varchar(64) CHARACTER SET utf8mb4 DEFAULT '', `status` int(11) DEFAULT NULL, `expired` datetime DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_from` (`from`), KEY `idx_to` (`to`) ) ENGINE=InnoDB AUTO_INCREMENT=35611184 DEFAULT CHARSET=utf8; INSERT INTO `deadlock` (`id`, `from`, `to`, `status`, `expired`) VALUES (5681309, '8', 'b', NULL, NULL), (5681310, 'b', '4', NULL, NULL), (5681311, '8', 'b', NULL, NULL), (5681312, '4', 'f', NULL, NULL), (5681313, '4', 'f', NULL, NULL), (5681314, '4', 'f', NULL, NULL), (5681315, 'f', '4', NULL, NULL), (35610742, '1', 'c', 1, '2020-07-07 02:03:58'), (35610744, '1', 'c', 1, '2020-07-07 02:03:58'), (35610745, '1', 'c', 1, '2020-07-07 02:03:58'), (35611180, '1', '2', 1, '2020-07-07 02:03:58'), (35611182, '1', '2', 1, '2020-07-07 02:03:58'), (35611183, '1', '2', 1, '2020-07-07 02:03:58');
两条SQL命中的记录各三条。一看是死锁,第一反应是发生记录资源互斥等待。猜想会不会是这6行记录在执行update的时候SQL1和SQL2修改的记录发生了互斥
菜瓜:所以你最开始想的是更新时两条SQL获取记录的顺序反了,譬如说SQL1先拿35610742,再拿35610744前,SQL2先把35610744拿到了且它要拿35610742在阻塞
水稻:是的,但是很快发现这个思路很离谱,我们说锁的时候应该是和索引联系起来,分析记录就有点偏了。而且按记录来看,两个SQL命中的记录都不一样。于是回到索引上,就能说通他们都命中了from='1'这个条件,在idx_from上发生了互斥。
菜瓜:soga,真实情况呢
水稻:事情如果这么简单我就不会研究它好几天了。分析完了之后开始准备复现死锁。思路很简单,因为SQL只有一句,要模拟出来就用多线程并发执行
- Java Code
@Resource private DeadlockMapper deadlockMapper; @Resource ThreadPoolTaskExecutor threadPoolTaskExecutor; @Test public void deadlock() { for (; ; ) { threadPoolTaskExecutor.execute(() -> { try { Long id = 35611183L; String from = "1"; String to = "2"; deadlockMapper.updateDeadlock(id, from, to); } catch (Exception e) { System.out.println("transaction tututu"); e.printStackTrace(); System.exit(0); } }); threadPoolTaskExecutor.execute(() -> { try { Long id = 35610745L; String from = "1"; String to = "c"; deadlockMapper.updateDeadlock(id, from, to); } catch (Exception e) { System.out.println("transaction kakaka"); e.printStackTrace(); System.exit(0); } }); System.out.println("once loop finish ..."); } }
- MyBatis Code
public interface DeadlockMapper extends Mapper<Deadlock> { int updateDeadlock(@Param("id") Long id, @Param("from") String from, @Param("to") String to); } <update id="updateDeadlock"> update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <![CDATA[<=]]> #{id} and `from` = #{from} and `to` = #{to} </update>
- 执行结果
once loop finish ... once loop finish ... transaction tututu transaction tututu org.springframework.dao.DeadlockLoserDataAccessException: ### Error updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction ### The error may involve defaultParameterMap ### The error occurred while setting parameters ### SQL: update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= ? and `from` = ? and `to` = ? ### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction 。。。
- 死锁日志 - show engine innodb status;
LATEST DETECTED DEADLOCK ------------------------ 2020-07-07 08:00:54 0x7f2c38b6a700 *** (1) TRANSACTION: TRANSACTION 61382, ACTIVE 0 sec starting index read mysql tables in use 3, locked 3 LOCK WAIT 4 lock struct(s), heap size 1136, 3 row lock(s) MySQL thread id 147, OS thread handle 139827907593984, query id 22005 101.68.68.234 root updating update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35611183 and `from` = '1' and `to` = '2' *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 66 page no 4 n bits 80 index idx_from of table `qc`.`deadlock` trx id 61382 lock_mode X waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f6076; asc `v;; *** (2) TRANSACTION: TRANSACTION 61381, ACTIVE 0 sec fetching rows mysql tables in use 3, locked 3 5 lock struct(s), heap size 1136, 11 row lock(s) MySQL thread id 150, OS thread handle 139827906782976, query id 22004 101.68.68.234 root updating update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35610745 and `from` = '1' and `to` = 'c' *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 66 page no 4 n bits 80 index idx_from of table `qc`.`deadlock` trx id 61381 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f6076; asc `v;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f6078; asc `x;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f6079; asc `y;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 1; hex 31; asc 1;; 1: len 4; hex 821f622c; asc b,;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 66 page no 3 n bits 80 index PRIMARY of table `qc`.`deadlock` trx id 61381 lock_mode X locks rec but not gap waiting Record lock, heap no 12 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 4; hex 821f622c; asc b,;; 1: len 6; hex 00000000e962; asc b;; 2: len 7; hex 27000001d901fd; asc ' ;; 3: len 1; hex 31; asc 1;; 4: len 1; hex 32; asc 2;; 5: len 4; hex 80000001; asc ;; 6: len 5; hex 99a6ce7d03; asc } ;; *** WE ROLL BACK TRANSACTION (1)
菜瓜:这不是很顺利吗
水稻:我会告诉你在我本地环境中一直没有复现出来吗?我会告诉你我找了一天才找到原因是因为MySQL配置问题(需要开启bin log,开启成功后执行 select version() 会看到 5.7.30-log)?我会告诉你这过程中我放弃了好几次又决定再看看吗?当然不会
菜瓜:好厉害哦!那你很棒棒哦!
水稻:。。。来,进入分析环节
- 两个SQL语句的explain执行计划
explain update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35610745 and `from` = '1' and `to` = 'c' ; -- Using intersect(idx_from,idx_to); Using where explain update `deadlock` set `status` = 1,`expired` = CURRENT_TIMESTAMP where `id` <= 35611183 and `from` = '1' and `to` = '2' ; -- Using intersect(idx_to,idx_from); Using where
- 可以看到执行这两个SQL的时候命中的索引竟然是NB的Merge Index(鬼知道这是啥),而且它两顺序还不一致(有兴趣的朋友可以上搜索一下这个玩意)
- Merge Index 索引合并技术是MySQL的一个优化,会将两个独立的索引扫描的结果进行取交集或者并集来加快数据的检索。
- 这里的关键点是它会导致MySQL在拿锁的顺序上不一致
菜瓜:你这个铺垫太多了,我已经记不清楚前面那么多东西了,请直接上结论吧
水稻:好的,前面的code是方便您自己下去复现的,对于动手党有用。您可以直接忽略
- 这里原因的解析只需要注意死锁日志和最后的explain执行计划结果
- 事务Transaction1
事务Transaction2
结论就是:id为35610742记录上的索引顺序不一致出现了死锁
SQL1(事务2)持有idx_from,在等待主键primary key。执行计划中它命中索引 intersect(idx_from,idx_to) ,非聚簇索引除了拿到自己的索引之外还要获取主键索引,所以它拿锁的顺序是先获取idx_from,再获取primary key,最后获取idx_to,再primary key
- SQL2(事务1)持有primary key,在等待idx_from。同理它命中索引intersect(idx_to,idx_from),它拿锁的顺序是先获取idx_to,再获取primary key,最后获取idx_from,再primary key
(日志里面看不出来SQL2持有主键索引,这里是根据出现死锁的互斥条件得出它持有主键索引)
菜瓜:这是何等的wocao!!!
水稻:解决这个问题只要把两个单个索引改成联合索引即可
总结:
- 单行update语句引发的死锁问题
- 单行记录拿锁顺序不一致出现死锁
- 索引合并Merge Index 导致获取锁顺序不一致
- 场景复现需要把log_bin日志打开,注意查看select version()
- 多线程模拟并发场景
- 死锁日志分析