根据事后在虚拟机中复现客户现场发生的情况,做一次记录(简化部分过程,原理不变)

客户端1执行update语句

SQL> select * from test;

    ID NAME
---------- --------------------------------
     2 b
    10 c
     1 b

SQL> update test set id = 3 where name = 'c';

1 row updated.

客户端2执行另外一条update语句

SQL> update test set id = 4 where name = 'c';

这个时候第二条update卡住了,证明发生了hanganalyze,查询当前的等待事件

SQL>  select vw.sid,v.serial#, vw.event, vw.p1raw, vw.p2, v.username, sql_id  from v$session_wait vw, v$session v where vw.sid = v.sid   and vw.event not like '%message%' order by 3 desc;

       SID    SERIAL# EVENT                                              P1RAW                    P2 USERNAME        SQL_ID
---------- ---------- -------------------------------------------------- ---------------- ---------- --------------- -------------
        13          1 smon timer                                         000000000000012C          0
         2          1 pmon timer                                         000000000000012C          0
        40          7 i/o slave wait                                     0000000091605340          0
        42         23 i/o slave wait                                     0000000091605340          0
        36         23 enq: TX - row lock contention                      0000000054580006     196608 ELAN            53msja8m8hbcw
         4          1 VKTM Logical Idle Wait                             00                        0
        31          3 Streams AQ: waiting for time management or cleanup 00                        0
                       tasks

        18         15 Streams AQ: qmn slave idle wait                    0000000000000001          0
        25          9 Streams AQ: qmn coordinator idle wait              00                        0
        33         43 Space Manager: slave idle wait                     00                        0
         8          1 DIAG idle wait                                     0000000000000005          1
         6          1 DIAG idle wait                                     0000000000000005          1

12 rows selected.

可以看到等待事件中有tx锁

进而查找彼此发生的依赖关系,是谁堵塞了谁

SELECT DECODE(request,0,'Holder: ','Waiter: ')||sid sess, id1, id2, lmode, request, type   FROM V$LOCK   WHERE (id1, id2, type) IN(SELECT id1, id2, type FROM V$LOCK         WHERE request>0)  ORDER BY id1, request;

SESS                                                    ID1        ID2      LMODE    REQUEST TY
------------------------------------------------ ---------- ---------- ---------- ---------- --
Holder: 46                                           196608        786          6          0 TX
Waiter: 36                                           196608        786          0          6 TX

发现,持有者是46号sid,等待着是36号sid,并且lmode模式是6

在oracle官方的支持文档中可以看到对于该模式的描述   exclusive 独占(X):独立访问使用,exclusive,通常发生在Alter table, Drop table, Drop Index, Truncate table, Lock Exclusive

进而可以根据sid查找是那条语句发生了阻塞

 SELECT   /*+ ORDERED */         a.SQL_TEXT    FROM v$sqltext a   WHERE (a.hash_value, a.address) IN (            SELECT DECODE (sql_hash_value,        0, prev_hash_value,                            sql_hash_value                          ),                   DECODE (sql_hash_value, 0, prev_sql_addr, sql_address)            FROM v$session b              where b.SID = 46)              order by a.PIECE

SQL_TEXT
----------------------------------------------------------------
update test set id = 3 where name = 'c'

进而可以根据sid和serial反查是那个机器发出了这条指令

select sid,serial#,machine,CLIENT_INFO from v$session where sid =
  2  '46';

       SID    SERIAL# MACHINE                                                          CLIENT_INFO
---------- ---------- ---------------------------------------------------------------- ----------------------------------------------------------------
        46         67 elan

基本就能够排查所出现情况,进而进行沟通解决,是不是语句未提交导致,或者是其他原因

等待持有锁的会话commit或者rollback。 通常为会话1在某行上执行 update/delete 未提交,会话2对同一行数据进行 update/delete,或其它原因(例如SQL性能差)造成的锁释放速度缓慢或网络问题,都会造成后续的会话进入队列等待

不着急查杀,这个还是要等待确认后,在进行下一步操作

接下里我们使用hanganalyze来分析上述问题

执行hanganalyze

SQL> ORADEBUG setmypid
Statement processed.
SQL> oradebug unlimit;
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /u01/app/oracle/diag/rdbms/elan/elan/trace/elan_ora_6322.trc
SQL> 

HANG ANALYSIS基本信息,摘录部分信息如下

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by: --阻塞会话信息
    {
                instance: 1 (elan.elan)
                   os id: 6337
              process id: 27, oracle@elan (TNS V1-V3)
              session id: 36
        session serial #: 23
    }
    is waiting for 'enq: TX - row lock contention' with wait info:
    {
                      p1: 'name|mode'=0x54580006
                      p2: 'usn<<16 | slot'=0x30000
                      p3: 'sequence'=0x312
            time in wait: 19 min 57 sec
           timeout after: never
                 wait id: 24
                blocking: 0 sessions  --此处显示当前会话没有阻塞的session,说明是被阻塞会话
             current sql: update test set id = 4 where name = 'c'
            wait history:
              * time between current wait and wait #1: 0.000966 sec
              1.       event: 'db file sequential read'
                 time waited: 0.000178 sec
                     wait id: 23              p1: 'file#'=0x5
                                              p2: 'block#'=0x8a
                 time waited: 0.000040 sec
                     wait id: 22              p1: 'FileOperation'=0x2
                                              p2: 'fileno'=0x5
                                              p3: 'filetype'=0x2
              * time between wait #2 and #3: 0.001004 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 17.589536 sec
                     wait id: 21              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (elan.elan)
                   os id: 6336
              process id: 25, oracle@elan (TNS V1-V3)
              session id: 46    --会话session
        session serial #: 67
    }
    which is waiting for 'SQL*Net message from client' with wait info:
    {
                      p1: 'driver id'=0x62657100
                      p2: '#bytes'=0x1
            time in wait: 20 min 21 sec
           timeout after: never
                 wait id: 25
                blocking: 1 session  --此处显示被阻塞的会话数是1,也证明是锁会话持有者
             current sql: <none>
             short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+117<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+495<-opikndf2()+978<-opitsk()+831<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
            wait history:
              * time between current wait and wait #1: 0.000003 sec
              1.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 24              p1: 'driver id'=0x62657100
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000069 sec
              2.       event: 'Disk file operations I/O'
                 time waited: 0.000040 sec
                     wait id: 23              p1: 'FileOperation'=0x2
                                              p2: 'fileno'=0x3
                                              p3: 'filetype'=0x2
              * time between wait #2 and #3: 0.000287 sec

  

通过上述分析:大概可以得出sid=36因为请求enq: TX - row lock contention(TX mode=6)被sid=46阻塞

分析过程基本如上

12-29 13:05