V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
MySQL 5.5 Community Server
MySQL 5.6 Community Server
Percona Configuration Wizard
XtraBackup 搭建主从复制
Great Sites on MySQL
Percona
MySQL Performance Blog
Severalnines
推荐管理工具
Sequel Pro
phpMyAdmin
推荐书目
MySQL Cookbook
MySQL 相关项目
MariaDB
Drizzle
参考文档
http://mysql-python.sourceforge.net/MySQLdb.html
alexfarm
V2EX  ›  MySQL

ON DUPLICATE KEY UPDATE 引起的死锁问题,求助一下

  •  1
     
  •   alexfarm · 308 天前 · 1168 次点击
    这是一个创建于 308 天前的主题,其中的信息可能已经有所发展或是发生改变。

    最近生产上遇到了一个死锁问题,看日志应该是和使用了 ON DUPLICATE KEY UPDATE 语法有关系。查阅了一些资料,这个语法的确在并发高时容易引起死锁问题,但主要都是 gap 锁和插入意图锁引起的。但我实际遇到的是和唯一索引、主键索引有关,在测试环境也无法复现,有没有大佬帮助一下。 以下是相关环境信息和日志信息, tb 是主要更新的表,tb1 和 tb2 是用于关联查询出插入内容

    Mysql 5.7 RC 事务隔离级别

    CREATE TABLE `tb` (
      `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
      `bid` varchar(38) NOT NULL,
      `sid` varchar(20) NOT NULL,
      `oid` varchar(20) NOT NULL,
      `emark` char(2) NOT NULL DEFAULT '00',
      `amark` char(2) NOT NULL DEFAULT '00',
      `bmark` char(2) NOT NULL DEFAULT '00',
      `cmark` char(2) NOT NULL DEFAULT '00',
      `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
      `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
      PRIMARY KEY (`id`),
      UNIQUE KEY `idx_bid_sid_oid` (`bid`,`sid`,`oid`),
      KEY `idx_oid` (`oid`) USING BTREE
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
    
    CREATE TABLE `tb1` (
      `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `sid` varchar(20) NOT NULL,
      `bid` varchar(38) NOT NULL,
      `oid` varchar(20) DEFAULT NULL,
      `status` char(2) NOT NULL DEFAULT '1',
      `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
      PRIMARY KEY (`id`),
      UNIQUE KEY `idx_bid_sid` (`bid`,`sid`),
      KEY `idx_sid` (`sid`),
      KEY `idx_oid` (`oid`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8
    
    CREATE TABLE `tb2` (
      `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `bid` varchar(38) NOT NULL,
      `expire_time` datetime DEFAULT NULL,
      `status` char(2) NOT NULL,
      `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
      `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`id`),
      UNIQUE KEY `idx_name_list_batch_id` (`batch_id`),
      KEY `idx_name_list_expire_time` (`expire_time`),
      KEY `index_name_list_market_no` (`market_case_no`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 
    

    InnoDB: *** (1) TRANSACTION:

    TRANSACTION 7367657071, ACTIVE 1 sec inserting
    mysql tables in use 1, locked 1
    LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
    MySQL thread id 3180164, OS thread handle 140056880891648, query id 2058483539
    tb update

    insert into tb
        ( 
        bid, sid, oid, emark, amark, bmark, cmark, update_time
       )
        values ('444444', '555555', '666666', '00', '00', '00', '00', now())
        ON DUPLICATE KEY UPDATE
        amark = values(amark),
        update_time = now()
    

    2023-12-28T19:00:05.105996+08:00 3180165 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

    RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367657071 lock_mode X waiting Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
    0: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
    1: len 10; hex 30313939373333303331; asc 555555;;
    2: len 6; hex 363733353839; asc 666666;;
    3: len 8; hex 00000000000910e7; asc ;;

    2023-12-28T19:00:05.106368+08:00 3180165 [Note]
    InnoDB: *** (2) TRANSACTION:

    TRANSACTION 7367656999, ACTIVE 2 sec inserting, thread declared inside InnoDB 236 mysql tables in use 3, locked 1 465 lock struct(s), heap size 57552, 38900 row lock(s), undo log entries 19415
    MySQL thread id 3180165, OS thread handle 140056037168896, query id 2058483278
    user Sending data

    insert into tb
          ( 
        bid, sid, oid, emark, amark, bmark, update_time
       )
          select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now()
          from tb1 left join tb2 
          on tb1.bid = tb2.bid
          where tb2.expire_time < '2023-12-31 19:00:03.963' and tb2.expire_time > now()
            and tb1.status IN
             (  
              '2'
             , 
              '3'
             ) 
           
          ON DUPLICATE KEY UPDATE
          emark = values(emark),
          update_time = now()
    

    2023-12-28T19:00:05.106420+08:00 3180165 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

    RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367656999 lock_mode X
    Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;;
    Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
    0: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
    1: len 10; hex 30313939373333303331; asc 555555;;
    2: len 6; hex 363733353839; asc 666666;;
    3: len 8; hex 00000000000910e7; asc ;;

    2023-12-28T19:00:05.139592+08:00 3180165 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

    RECORD LOCKS space id 270 page no 7273 n bits 88 index PRIMARY of table db.tb trx id 7367656999 lock_mode X locks rec but not gap waiting
    Record lock, heap no 14 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
    0: len 8; hex 00000000000aaf6c; asc l;;
    1: len 6; hex 0001b725866f; asc % o;;
    2: len 7; hex e20000118b0110; asc ;;
    3: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
    4: len 10; hex 30313939373333303331; asc 555555;;
    5: len 6; hex 363733353839; asc 666666;;
    6: len 2; hex 3030; asc 00;;
    7: len 2; hex 3030; asc 00;;
    8: len 2; hex 3030; asc 00;;
    9: len 2; hex 3030; asc 00;;
    10: len 5; hex 99b1f93004; asc 0 ;;
    11: len 5; hex 99b1f93004; asc 0 ;;

    2023-12-28T19:00:05.140247+08:00 3180165 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

    10 条回复    2024-01-22 22:33:46 +08:00
    alexfarm
        1
    alexfarm  
    OP
       304 天前
    求大神给点排查思路啊~
    alexfarm
        2
    alexfarm  
    OP
       302 天前
    h0099
        3
    h0099  
       302 天前
    https://www.v2ex.com/t/908047#r_14219204
    https://www.v2ex.com/t/1005536#r_14219181

    我的评价是:建议先大脑升级`最新最热最凶最恶`mysql8.2
    alexfarm
        4
    alexfarm  
    OP
       302 天前
    @h0099 这个升级决定不了呀
    h0099
        5
    h0099  
       302 天前
    事务 2`INSERT INTO tb SELECT ...`的 subquery
    ```sql
    select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now()
    from tb1 left join tb2
    on tb1.bid = tb2.bid
    where tb2.expire_time < '2023-12-31 19:00:03.963'
    and tb2.expire_time > now()
    and tb1.status IN ('2', '3')
    ```
    返回中有没有`(bid, sid, oid)`=`('444444', '555555', '666666')`的行?
    h0099
        6
    h0099  
       302 天前
    h0099
        7
    h0099  
       302 天前
    首先您 tb2 删漏了
    ```diff
    CREATE TABLE `tb2` (
    `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
    `bid` varchar(38) NOT NULL,
    `expire_time` datetime DEFAULT NULL,
    `status` char(2) NOT NULL,
    `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `update_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`),
    - UNIQUE KEY `idx_name_list_batch_id` (`batch_id`),
    + UNIQUE KEY `idx_name_bid` (`bid`),
    KEY `idx_name_list_expire_time` (`expire_time`),
    - KEY `index_name_list_market_no` (`market_case_no`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8
    ```
    在 mysql8.0.35 下创建表后
    ```sql
    INSERT INTO tb1(sid, bid, oid, status) VALUES(0, 1, 2, 3);
    INSERT INTO tb2(bid, expire_time, status) VALUES(1, ADDDATE(NOW(), 2), 3);
    ```
    默认 RR 事务隔离级别下
    ```sql
    -- session1
    START TRANSACTION;
    insert into tb(bid, sid, oid, emark, amark, bmark, cmark, update_time) values ('444444', '555555', '666666', '00', '00', '00', '00', now()) ON DUPLICATE KEY UPDATE amark = values(amark), update_time = now();
    -- 重复执行
    ROLLBACK;
    ```
    ```sql
    -- session2 您的`'2023-12-31 19:00:03.963'`我改成了`ADDDATE(NOW(), 3)`以便 reprod
    START TRANSACTION;
    insert into tb(bid, sid, oid, emark, amark, bmark, update_time) select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now() from tb1 left join tb2 on tb1.bid = tb2.bid where tb2.expire_time < ADDDATE(NOW(), 3) and tb2.expire_time > now() and tb1.status IN ('2', '3') ON DUPLICATE KEY UPDATE emark = values(emark), update_time = now();
    -- 重复执行
    ROLLBACK;
    ```
    不论两个谁先执行执行几次它们都不会等待 acquire 另一事务占有的锁
    而在 RC 下
    |session1|session2|
    |-|-|
    |`SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;`|Ibid.|
    |`START TRANSACTION;`|Ibid.|
    |`SHOW SESSION VARIABLES LIKE '%iso%'\G`|Ibid.|
    |`insert into tb(bid, sid, oid, emark, amark, bmark, cmark, update_time) values ('444444', '555555', '666666', '00', '00', '00', '00', now()) ON DUPLICATE KEY UPDATE amark = values(amark), update_time = now();`||
    ||`insert into tb(bid, sid, oid, emark, amark, bmark, update_time) select tb1.bid, tb1.sid, tb1.oid, '10', '00', '00', now() from tb1 left join tb2 on tb1.bid = tb2.bid where tb2.expire_time < ADDDATE(NOW(), 3) and tb2.expire_time > now() and tb1.status IN ('2', '3') ON DUPLICATE KEY UPDATE emark = values(emark), update_time = now();`|
    ||等待 session1 释放某个锁|
    |`ROLLBACK;`||
    ||继续执行并返回|

    而如果 session2 先执行那后续它们以任意顺序执行都不会有等待
    这可能是因为`tb`此时作为空表(`CREATE`后没有`INSERT`过) session1`INSERT`后需要表级锁以保证其他事务能够 RC (看到其刚`INSERT`的这行,但由于不是`READ UNCOMMITTED`RU 级别所以需要等待 session1`ROLLBACK`/`COMMIT`)
    https://stackoverflow.com/questions/56434878/mysql-deadlock-using-an-index-with-a-new-value/56435470#56435470
    > In an empty table, the place that the 1 would have been is "anywhere in the table" (or anywhere from the start to the "supremum" mentioned in the deadlock) - which is consequently gaplocked by the delete. The same is true for the 2. And those locks do not conflict each other by definition.

    而在 session2 先`INSERT`后`tb`不是空表所以 session1 无需表级锁而可以在索引`idx_bid_sid`上锁`('444444', '555555')`,而 session2`INSERT SELECT`subquery 所返回行`(bid, sid)=(1, 0)`中也没有跟`('444444', '555555')`冲突因而无需`ON DUPLICATE KEY UPDATE`从而避免等待 session1 释放对那行的锁
    所以您的`tb`表在 session1 执行`INSERT`时是空的吗?
    h0099
        8
    h0099  
       302 天前
    如果您能够修改逻辑避免在 sql 中用到`ON DUPLICATE KEY UPDATE`aka 您不是只能加索引调参优化的 dba 而是透过 orm 层层抽象隔离写 java 业务的 dev 那您也可以像您 at 我过来的那帖中那样换成`RC 事务隔离级别`+`SELECT ... WHERE unique 约束或 PK FOR UPDATE`产生`表级 IX 锁` https://github.com/n0099/open-tbm/issues/32#issuecomment-1401223094 对`SELECT`出既有的 record 来`UPDATE ... SET a|emark = ... WHERE unique 约束或 PK`再对不存在的 record 进行`INSERT`此时可以保证`INSERT`不会返回`ERROR 1062 (23000): Duplicate entry`因为此前的`SELECT ... FOR UPDATE`已经上了`表级 IX 锁` https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-intention-locks 而此时其他`INSERT`需要`行级 X 锁`就会阻塞等待`表级 IX 锁`释放
    > Before a transaction can acquire an exclusive lock on a row in a table, it must first acquire an IX lock on the table.
    h0099
        9
    h0099  
       302 天前
    如果单独只看您发的那坨`SHOW ENGINE INNODB STATUS`deadlock detection log 的话那当时场景就是
    session2 先执行因而持有着`(bid, sid, oid)`
    ```
    RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367656999 lock_mode X
    Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;;
    Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
    ```
    上的末端`(某值, +∞)`[`nextkey`]( https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-next-key-locks)X 锁因为有[`hex 73757072656d756d; asc supremum`]( https://en.wikipedia.org/wiki/Infimum_and_supremum) https://dev.mysql.com/blog-archive/innodb-data-locking-part-2-5-locks-deeper-dive/

    而后执行的 session1 也在尝试 acquire`(bid, sid, oid)=('444444', '555555', '666666')`上的 recordlockX 锁但由于其位于 session2 的`nextkey`X 锁范围中而等待
    ```
    RECORD LOCKS space id 270 page no 6338 n bits 352 index idx_bid_sid_oid of table db.tb trx id 7367657071 lock_mode X waiting Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
    ```

    当 session1 在等待时 session2 也开始等待 PK (但下面的约束不是在`PRIMARY KEY (id)`上而是`INSERT`中的其他字段可能因为其是`AUTO_INCREMENT`AI 而您也没显式指定其值所以无法`WHERE`)`(bid, sid, oid, emark, amark, bmark, cmark)=('444444', '555555', '666666', '00', '00', '00', '00')`上的[recordlock(有`but not gap`)]( https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-record-locks)X
    ```
    RECORD LOCKS space id 270 page no 7273 n bits 88 index PRIMARY of table db.tb trx id 7367656999 lock_mode X locks rec but not gap waiting
    [...]
    3: len 25; hex 4f4243574830315a3032363631323032333132323241303536; asc 444444;;
    4: len 10; hex 30313939373333303331; asc 555555;;
    5: len 6; hex 363733353839; asc 666666;;
    [...]
    ```
    所以是互相等待对方而被 deadlock detection 并决定 rollback 后来的 session1 让 session2 先过
    ```
    2023-12-28T19:00:05.140247+08:00 3180165 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)
    ```
    alexfarm
        10
    alexfarm  
    OP
       297 天前
    @h0099 感谢大佬,前几天没登录,我先慢慢看你写的,谢谢!
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   5629 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 36ms · UTC 06:05 · PVG 14:05 · LAX 22:05 · JFK 01:05
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.