11g之后,通过v$wait_chains视图诊断数据库hang和Contention
11g之前,通常我们数据库hang住了之后,我们会对数据库做hang analyze来进行分析,在11g之后。我们能够通过一个新的视图v$wait_chains来诊断数据库hang和contention。在11gR1这个版本号里面,Oracle通过diag进程实现了一个功能。每隔3秒做一次本地的hang analyze。每隔10秒做一次global的hang analyze。而这些信息会存放在内存里面,Oracle把这一块内存称作”hang analysis cache”。而这一部分内存信息,对我们数据库诊断hang和contention起着很关键的数据。而数据库另一些特性及工具也须要使用这块内存区域。比方Hang
Management, Resource Manager Idle Blocker Kill, SQL Tune Hang Avoidance和pmon清除,另一些外部工具如Procwatcher。
我们看一下v$wait_chains视图的定义。以11gR2为例。
SQL> desc v$wait_chains
Name Null? Type
----------------------------------------- -------- ----------------------------
CHAIN_ID NUMBER
CHAIN_IS_CYCLE VARCHAR2(5)
CHAIN_SIGNATURE VARCHAR2(801)
CHAIN_SIGNATURE_HASH NUMBER
INSTANCE NUMBER
OSID VARCHAR2(25)
PID NUMBER
SID NUMBER
SESS_SERIAL# NUMBER
BLOCKER_IS_VALID VARCHAR2(5)
BLOCKER_INSTANCE NUMBER
BLOCKER_OSID VARCHAR2(25)
BLOCKER_PID NUMBER
BLOCKER_SID NUMBER
BLOCKER_SESS_SERIAL# NUMBER
BLOCKER_CHAIN_ID NUMBER
IN_WAIT VARCHAR2(5)
TIME_SINCE_LAST_WAIT_SECS NUMBER
WAIT_ID NUMBER
WAIT_EVENT NUMBER
WAIT_EVENT_TEXT VARCHAR2(64)
P1 NUMBER
P1_TEXT VARCHAR2(64)
P2 NUMBER
P2_TEXT VARCHAR2(64)
P3 NUMBER
P3_TEXT VARCHAR2(64)
IN_WAIT_SECS NUMBER
TIME_REMAINING_SECS NUMBER
NUM_WAITERS NUMBER
ROW_WAIT_OBJ# NUMBER
ROW_WAIT_FILE# NUMBER
ROW_WAIT_BLOCK# NUMBER
ROW_WAIT_ROW# NUMBER
继续查询该视图的定义。
能够发现该数据来自于基表x$ksdhng_chains。由于前面介绍过进程会10秒做一次global的hang,所以这个视图是包括了全局的信息的。
尽管它是v$开头的。
SQL> select * from V$FIXED_VIEW_DEFINITION where view_name like '%WAIT_CHAINS%';
VIEW_NAME VIEW_DEFINITION
-------------------- ----------------------------------------------------------------------------------------------------------------------------------
V$WAIT_CHAINS select s.chain_id, decode(s.chain_is_cycle, 0,'FALSE','TRUE'), s.chain_signature, s.chain_signature_hash, s.instance, s.osid,
s.pid, s.sid, s.sess_serial#, decode(s.blocker_is_valid, 0,'FALSE','TRUE'), decode(s.blocker_is_valid, 0, to_number(null), s.
blocker_instance), s.blocker_osid, decode(s.blocker_is_valid, 0, to_number(null), s.blocker_pid), decode(s.blocker_is_valid, 0,
to_number(null), s.blocker_sid), decode(s.blocker_is_valid, 0, to_number(null), s.blocker_sess_serial#), decode(s.blocker_chain
_id, 0, to_number(null), s.blocker_chain_id), decode(s.in_wait, 0,'FALSE','TRUE'), decode(s.in_wait, 0, s.time_since_last_wait
_secs, to_number(null)), decode(s.in_wait, 0, to_number(null), s.wait_id), decode(s.in_wait, 0, to_number(null), s.wait_event),
s.wait_event_text, decode(s.in_wait, 0, to_number(null), s.p1), s.p1_text, decode(s.in_wait, 0, to_number(null), s.p2), s.p2
_text, decode(s.in_wait, 0, to_number(null), s.p3), s.p3_text, decode(s.in_wait, 0, to_number(null), s.in_wait_secs), decode(
s.in_wait, 0, to_number(null), s.time_remaining_secs), s.num_waiters, decode(s.in_wait, 0, to_number(null), s.row_wait_obj#),
decode(s.in_wait, 0, to_number(null), s.row_wait_file#), decode(s.in_wait, 0, to_number(null), s.row_wait_block#), decode(s.in_w
ait, 0, to_number(null), s.row_wait_row#) from X$KSDHNG_CHAINS s
Oracle在mos上提供了一些脚本来做一些信息诊断。一种是普通版本号的都可以使用的,另一种是11gR2专用的。由于在11gR2的v$session视图中有一个字段叫final_blocking_session,这个字段可以去查看最上层的堵塞者。
终于的blocker一般都处于wait_chain的顶端。这种session才会引起问题。我们先来看看普通的查询.首先随便制造两个session共同更新一行的情况。
SQL> SELECT chain_id, num_waiters, in_wait_secs, osid, blocker_osid, substr(wait_event_text,1,30)
2 FROM v$wait_chains;
CHAIN_ID NUM_WAITERS IN_WAIT_SECS OSID BLOCKER_OSID SUBSTR(WAIT_EVENT_TEXT,1,30)
---------- ----------- ------------ ------------------------- ------------------------- ------------------------------------------------------------
1 0 8 31377 31447 enq: TX - row lock contention
1 1 15 31447 SQL*Net message from client
通用的查询:
接下来在运行下一个基础的格式化后的脚本.
set pages 1000
set lines 120
set heading off
column w_proc format a50 tru
column instance format a20 tru
column inst format a28 tru
column wait_event format a50 tru
column p1 format a16 tru
column p2 format a16 tru
column p3 format a15 tru
column Seconds format a50 tru
column sincelw format a50 tru
column blocker_proc format a50 tru
column waiters format a50 tru
column chain_signature format a100 wra
column blocker_chain format a100 wra
SELECT *
FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE,
'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'',blocker_osid)||
' from Instance '||blocker_instance BLOCKER_PROC,'Number of waiters: '||num_waiters waiters,
'Wait Event: ' ||wait_event_text wait_event, 'P1: '||p1 p1, 'P2: '||p2 p2, 'P3: '||p3 p3,
'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
'',blocker_chain_id) blocker_chain
FROM v$wait_chains wc,
v$instance i
WHERE wc.instance = i.instance_number (+)
AND ( num_waiters > 0
OR ( blocker_osid IS NOT NULL
AND in_wait_secs > 10 ) )
ORDER BY chain_id,
num_waiters DESC)
WHERE ROWNUM < 101;
终于结果例如以下图所看到的,我们可以清楚的看到进程12476堵塞了进程13018。
进程13018在等待enq: TX – row lock contention。
Current Process: 12476 SID orcl INST #: 1
Blocking Process: from Instance
Number of waiters: 2
Final Blocking Process: from Instance
Program:
Wait Event: SQL*Net message from client P1: 1650815232 P2: 1 P3: 0
Seconds in Wait: 2503
Seconds Since Last Wait:
Wait Chain: 1: 'SQL*Net message from client'<='enq: TX - row lock contention'
Blocking Wait Chain:
Current Process: 13018 SID orcl INST #: 1
Blocking Process: 12476 from Instance 1
Number of waiters: 0
Final Blocking Process: 12476 from Instance 1
Program: oracle@rhel5 (TNS V1-V3)
Wait Event: enq: TX - row lock contention P1: 1415053318 P2: 458753 P3: 2465
Seconds in Wait: 441
Seconds Since Last Wait:
Wait Chain: 1: 'SQL*Net message from client'<='enq: TX - row lock contention'
Blocking Wait Chain:
使用final_blocking_session字段,能查到最上端的堵塞进程。
set pages 1000
set lines 120
set heading off
column w_proc format a50 tru
column instance format a20 tru
column inst format a28 tru
column wait_event format a50 tru
column p1 format a16 tru
column p2 format a16 tru
column p3 format a15 tru
column Seconds format a50 tru
column sincelw format a50 tru
column blocker_proc format a50 tru
column fblocker_proc format a50 tru
column waiters format a50 tru
column chain_signature format a100 wra
column blocker_chain format a100 wra
SELECT *
FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE,
'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'',blocker_osid)||
' from Instance '||blocker_instance BLOCKER_PROC,
'Number of waiters: '||num_waiters waiters,
'Final Blocking Process: '||decode(p.spid,null,'',
p.spid)||' from Instance '||s.final_blocking_instance FBLOCKER_PROC,
'Program: '||p.program image,
'Wait Event: ' ||wait_event_text wait_event, 'P1: '||wc.p1 p1, 'P2: '||wc.p2 p2, 'P3: '||wc.p3 p3,
'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
'',blocker_chain_id) blocker_chain
FROM v$wait_chains wc,
gv$session s,
gv$session bs,
gv$instance i,
gv$process p
WHERE wc.instance = i.instance_number (+)
AND (wc.instance = s.inst_id (+) and wc.sid = s.sid (+)
and wc.sess_serial# = s.serial# (+))
AND (s.final_blocking_instance = bs.inst_id (+) and s.final_blocking_session = bs.sid (+))
AND (bs.inst_id = p.inst_id (+) and bs.paddr = p.addr (+))
AND ( num_waiters > 0
OR ( blocker_osid IS NOT NULL
AND in_wait_secs > 10 ) )
ORDER BY chain_id,
num_waiters DESC)
WHERE ROWNUM < 101;
Current Process: 12028 SID orcl INST #: 1
Blocking Process: from Instance Number of waiters: 2
Wait Event: SQL*Net message from client P1: 1650815232 P2: 1 P3: 0
Seconds in Wait: 1155 Seconds Since Last Wait:
Wait Chain: 1: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock c
ontention'
Blocking Wait Chain:
Current Process: 12164 SID orcl INST #: 1
Blocking Process: 12028 from Instance 1 Number of waiters: 1
Wait Event: enq: TX - row lock contention P1: 1415053318 P2: 589825 P3: 2599
Seconds in Wait: 964 Seconds Since Last Wait:
Wait Chain: 1: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock c
ontention'
Blocking Wait Chain:
Current Process: 12342 SID orcl INST #: 1
Blocking Process: 12164 from Instance 1 Number of waiters: 0
Wait Event: enq: TX - row lock contention P1: 1415053318 P2: 327708 P3: 2417
Seconds in Wait: 954 Seconds Since Last Wait:
Wait Chain: 1: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock c
ontention'
Blocking Wait Chain:
Current Process: 12476 SID orcl INST #: 1
Blocking Process: from Instance Number of waiters: 1
Wait Event: SQL*Net message from client P1: 1650815232 P2: 1 P3: 0
Seconds in Wait: 578 Seconds Since Last Wait:
Wait Chain: 2: 'SQL*Net message from client'<='enq: TX - row lock contention'
Blocking Wait Chain:
Current Process: 12527 SID orcl INST #: 1
Blocking Process: 12476 from Instance 1 Number of waiters: 0
Wait Event: enq: TX - row lock contention P1: 1415053318 P2: 458753 P3: 2465
Seconds in Wait: 567 Seconds Since Last Wait:
Wait Chain: 2: 'SQL*Net message from client'<='enq: TX - row lock contention'
Blocking Wait Chain:
这里可以看到当前会话是2395在等待enq: TM – contention,而它的顶级堵塞者是2309。通过这些脚本我们可以方便的进行查询。可以方便的找到谁是堵塞者。甚至是最上层的堵塞者。当然在我们的diaghang.sql的脚本里面。我们看到了下列内容。这里Oracle通过我们内存直接訪问,从x$ksdhng_chains里面把chain的信息所有获取出来,用于终于的hang分析的诊断。
-- dump hang analysis chains
oradebug direct_access enable trace
oradebug direct_access disable reply
oradebug direct_access set content_type = 'text/plain'
oradebug direct_access select * from x$ksdhng_chains
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (orcl.orcl)
os id: 13018
process id: 39, oracle@rhel5 (TNS V1-V3)
session id: 1
session serial #: 516
}
is waiting for 'enq: TX - row lock contention' with wait info:
{
p1: 'name|mode'=0x54580006
p2: 'usn<<16 | slot'=0x70001
p3: 'sequence'=0x9a1
time in wait: 27.987600 sec
timeout after: never
wait id: 52
blocking: 0 sessions
current sql: update waitchain set name='liuyang2' where id=3
short stack: ksedsts()+379<-ksdxfstk()+19<-ksdxcb()+1585<-sspuser()+100<-semtimedop()+36<-sskgpwwait()+202<-skgpwwait()+125<-ksliwat()+1564<-kslwaitctx()+138<-ksqcmi()+5784<-ksqgtlctx()+3334<-ksqgelctx()+450<-ktcwit1()+260<-kdddgb()+7099<-kdusru()+412<-kauupd()+278<-updrow()+1596<-qerupFetch()+631<-updaul()+1079<-updThreePhaseExe()+262<-updexe()+571<-opiexe()+13536<-kpoal8()+1888<-opiodr()+958<-ttcpip()+1109<-opitsk()+1325<-opiino()+827<-opiodr()+958<-opidrv()+479<-sou2o()+80<-opimai_real()+109<-ssthrdmain()+175<-main()+147<-_
wait history:
* time between current wait and wait #1: 0.000071 sec
1. event: 'db file sequential read'
time waited: 0.000008 sec
wait id: 51 p1: 'file#'=0x3
p2: 'block#'=0x26d1a
p3: 'blocks'=0x1
* time between wait #1 and #2: 0.000043 sec
2. event: 'db file sequential read'
time waited: 0.000008 sec
wait id: 50 p1: 'file#'=0x3
p2: 'block#'=0xc5
p3: 'blocks'=0x1
* time between wait #2 and #3: 0.000047 sec
3. event: 'db file sequential read'
time waited: 0.000016 sec
wait id: 49 p1: 'file#'=0x3
p2: 'block#'=0x1bd
p3: 'blocks'=0x1
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (orcl.orcl)
os id: 12476
process id: 37, oracle@rhel5 (TNS V1-V3)
session id: 52
session serial #: 155
}
which is waiting for 'SQL*Net message from client' with wait info:
{
p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
time in wait: 34 min 50 sec
timeout after: never
wait id: 49
blocking: 2 sessions
current sql: <none>
short stack: ksedsts()+379<-ksdxfstk()+19<-ksdxcb()+1585<-sspuser()+100<-read()+35<-ntpfprd()+95<-nsbasic_brc()+327<-nsbrecv()+72<-nioqrc()+449<-__PGOSF25_opikndf2()+954<-opitsk()+611<-opiino()+827<-opiodr()+958<-opidrv()+479<-sou2o()+80<-opimai_real()+109<-ssthrdmain()+175<-main()+147<-__libc_start_main()+220<-_start()+33
wait history:
* time between current wait and wait #1: 0.000020 sec
1. event: 'SQL*Net message to client'
time waited: 0.000000 sec
wait id: 48 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.000039 sec
2. event: 'SQL*Net message from client'
time waited: 0.000260 sec
wait id: 47 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000462 sec
3. event: 'asynch descriptor resize'
time waited: 0.000004 sec
wait id: 46 p1: 'outstanding #aio'=0x0
p2: 'current aio limit'=0x80
p3: 'new aio limit'=0x81
}
Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
-------------------------------------------------------------------------------