1. 描述:

使用crs_stat –t 命令查看rac服務,直接報CRS-0184: Cannot communicate with the CRS daemon.錯誤

但是奇怪的是我們的DB是沒有問題的。sqlplus  / as sysdba可以繼續登陸,并使用。

2. 錯誤分析:

首先查看警告日誌:錯誤從2016/07/13號開始

/grid/11.2.0/log/phars1/alertphars1.log

2016-07-13 16:04:49.616:
[crsd(21419)]CRS-2765:Resource 'ora.VOTDG.dg' has failed on server 'phars1'.
2016-07-13 16:04:49.702:
[crsd(21419)]CRS-2878:Failed to restart resource 'ora.VOTDG.dg'
2016-07-13 16:04:49.703:
[crsd(21419)]CRS-2769:Unable to failover resource 'ora.VOTDG.dg'.
2016-07-13 19:39:38.436:
[crsd(21419)]CRS-1006:The OCR location +VOTDG is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:38.437:
[crsd(21419)]CRS-1006:The OCR location +VOTDG is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:53.742:
[/grid/11.2.0/bin/oraagent.bin(30612)]CRS-5822:Agent '/grid/11.2.0/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:11:9490} in /grid/11.2
.0/log/phars1/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2016-07-13 19:39:53.742:
[/grid/11.2.0/bin/orarootagent.bin(21814)]CRS-5822:Agent '/grid/11.2.0/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:36} in /grid/1
1.2.0/log/phars1/agent/crsd/orarootagent_root/orarootagent_root.log.
2016-07-13 19:39:53.743:
[/grid/11.2.0/bin/oraagent.bin(21774)]CRS-5822:Agent '/grid/11.2.0/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:5:10} in /grid/11.2.0/log/phars1/agent/crsd/oraagent_grid/oraagent_grid.log.
2016-07-13 19:39:53.743:
[/grid/11.2.0/bin/scriptagent.bin(1919)]CRS-5822:Agent '/grid/11.2.0/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:13:12} in /grid/11.
2.0/log/phars1/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2016-07-13 19:39:53.745:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:39:55.153:
[crsd(16165)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:55.162:
[crsd(16165)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:55.774:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:39:57.201:
[crsd(16185)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:57.210:
[crsd(16185)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:57.814:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:39:59.206:
[crsd(16210)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:59.214:
[crsd(16210)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:39:59.843:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:01.237:
[crsd(16223)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:01.245:
[crsd(16223)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:01.872:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:03.263:
[crsd(16238)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:03.273:
[crsd(16238)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:03.900:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:05.293:
[crsd(16254)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:05.302:
[crsd(16254)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:05.929:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:07.325:
[crsd(16271)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:07.335:
[crsd(16271)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:07.956:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:09.346:
[crsd(16290)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:09.355:
[crsd(16290)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:09.985:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:11.376:
[crsd(16327)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:11.386:
[crsd(16327)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:12.013:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:13.401:
[crsd(16340)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:13.411:
[crsd(16340)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /grid/11.2.0/log/phars1/crsd/crsd.log.
2016-07-13 19:40:14.053:
[ohasd(20149)]CRS-2765:Resource 'ora.crsd' has failed on server 'phars1'.
2016-07-13 19:40:14.053:
[ohasd(20149)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2016-07-13 19:40:14.053:
[ohasd(20149)]CRS-2769:Unable to failover resource 'ora.crsd'.

分析上面這段日誌,過程就是 資源'ora.VOTDG.dg' failed=》嘗試重啟該資源=》重啟失敗=》OCR文件的位置+VOTDG無法訪問=》最後就導致了crs的異常,由於無法訪問物理存儲。=》嘗試重啟達到最大次數之後,放棄了重啟=》crsd失敗。

上面的全部證明就表示是由於VOTDG無法訪問,導致了crs服務的異常

接下來我們再看看/grid/11.2.0/log/phars1/crsd/crsd.log日誌

2016-07-13 16:04:49.615: [    AGFW][4118722304]{0:5:6} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:162956
2016-07-13 16:04:49.615: [    AGFW][4118722304]{0:5:6} Verifying msg rid = ora.VOTDG.dg phars1 1
2016-07-13 16:04:49.615: [    AGFW][4118722304]{0:5:6} Received state change for ora.VOTDG.dg phars1 1 [old state = ONLINE, new state = OFFLINE]  --這裡提示ora.VOTDG.dg的狀態變為了offline
2016-07-13 16:04:49.615: [    AGFW][4118722304]{0:5:6} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA
:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:287142:Ver:2]
2016-07-13 16:04:49.615: [    AGFW][4118722304]{0:5:6} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:162956
2016-07-13 16:04:49.616: [   CRSPE][4108216064]{0:5:6} State change received from phars1 for ora.VOTDG.dg phars1 1
2016-07-13 16:04:49.616: [   CRSPE][4108216064]{0:5:6} Processing PE command id=13336. Description: [Resource State Change (ora.VOTDG.dg phars1 1) : 0x7fb470104850]
2016-07-13 16:04:49.616: [   CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new external state [OFFLINE] old value: [ONLINE] on phars1 label = []
2016-07-13 16:04:49.616: [    CRSD][4108216064]{0:5:6} {0:5:6} Resource Resource Instance ID[ora.VOTDG.dg phars1 1]. Values:
STATE=OFFLINE
TARGET=ONLINE
LAST_SERVER=phars1
CURRENT_RCOUNT=0
LAST_RESTART=0
FAILURE_COUNT=0
FAILURE_HISTORY=
STATE_DETAILS=
INCARNATION=0
STATE_CHANGE_VERS=0
LAST_FAULT=0
LAST_STATE_CHANGE=1468397089
INTERNAL_STATE=0
DEGREE_ID=1
ID=ora.VOTDG.dg phars1 1
Lock Info:
Write Locks:none
ReadLocks:|STATE INITED||ONLINE STATERECOVERED| has failed!
2016-07-13 16:04:49.616: [   CRSPE][4108216064]{0:5:6} Processing unplanned state change for [ora.VOTDG.dg phars1 1]
2016-07-13 16:04:49.617: [   CRSPE][4108216064]{0:5:6} Scheduled local recovery for [ora.VOTDG.dg phars1 1]
2016-07-13 16:04:49.617: [  CRSRPT][4106114816]{0:5:6} Published to EVM CRS_RESOURCE_STATE_CHANGE for ora.VOTDG.dg
2016-07-13 16:04:49.617: [   CRSPE][4108216064]{0:5:6} Op 0x7fb4700c89d0 has 5 WOs
2016-07-13 16:04:49.618: [   CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new internal state: [STARTING] old value: [STABLE]
2016-07-13 16:04:49.618: [   CRSPE][4108216064]{0:5:6} Sending message to agfw: id = 287144
2016-07-13 16:04:49.618: [   CRSPE][4108216064]{0:5:6} CRS-2672: Attempting to start 'ora.VOTDG.dg' on 'phars1'

2016-07-13 16:04:49.618: [    AGFW][4118722304]{0:5:6} Agfw Proxy Server received the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144
2016-07-13 16:04:49.619: [    AGFW][4118722304]{0:5:6} Agfw Proxy Server forwarding the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144 to the agent /gr
id/11.2.0/bin/oraagent_grid
2016-07-13 16:04:49.673: [    AGFW][4118722304]{0:5:6} Received the reply to the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287145 from the agent /grid/11
.2.0/bin/oraagent_grid
2016-07-13 16:04:49.673: [    AGFW][4118722304]{0:5:6} Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144
2016-07-13 16:04:49.673: [   CRSPE][4108216064]{0:5:6} Received reply to action [Start] message ID: 287144
2016-07-13 16:04:49.701: [    AGFW][4118722304]{0:5:6} Received the reply to the message: RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287145 from the agent /grid/11
.2.0/bin/oraagent_grid
2016-07-13 16:04:49.701: [    AGFW][4118722304]{0:5:6} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.VOTDG.dg phars1 1] ID 4098:287144
2016-07-13 16:04:49.701: [   CRSPE][4108216064]{0:5:6} Received reply to action [Start] message ID: 287144
2016-07-13 16:04:49.701: [   CRSPE][4108216064]{0:5:6} RI [ora.VOTDG.dg phars1 1] new internal state: [STABLE] old value: [STARTING]
2016-07-13 16:04:49.701: [   CRSPE][4108216064]{0:5:6} CRS-2674: Start of 'ora.VOTDG.dg' on 'phars1' failed

這裡日誌也主要講'ora.VOTDG.dg' 失敗,導致crs的失敗

3. 錯誤解決:

①首先是提示我的crs服務不能通信,所以我首先去查看我的alert log 和 crs log

②通過查看crsd.log還看到下面這句話

2016-07-15 10:17:24.000: [  OCRASM][992749344]proprasmo: The ASM disk group VOTDG is not found or not mounted

這裡提示我的votedisk磁盤沒有找到或沒有mount

③因為我的DB是正常的,我去查看我的votedisk磁盤狀態

SQL> select name,state from v$asm_diskgroup;

NAME                   STATE
------------------------------ -----------
BACKUPDG               CONNECTED
DATADG                   CONNECTED
SYSDG                   CONNECTED
VOTDG                   DISMOUNTED

這裡的確顯示我的votedisk dismounted了。正常狀態是必須mounted的

手動mount votedisk

grid@phars1: /home/grid> sqlplus / as sysasm   --這裡注意要使用grid用戶的sysasm登陸

SQL*Plus: Release 11.2.0.4.0 Production on Fri Jul 15 11:38:40 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup VOTDG mount;  --手動mount  votedisk磁盤

Diskgroup altered.

這個在兩邊都要做。

然後重啟一下cluster服務,就好了。注意在沒有mount起來重啟是無效的。只有mount了之後才能正常停起

[root@phaws1 ~]# crsctl stop cluster -all
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4704: Shutdown of Clusterware failed on node phaws1.
CRS-4704: Shutdown of Clusterware failed on node phaws2.
CRS-4000: Command Stop failed, or completed with errors.
[root@phaws1 ~]# crsctl start cluster -all
CRS-2672: Attempting to start 'ora.crsd' on 'phaws1'
CRS-2672: Attempting to start 'ora.crsd' on 'phaws2'
CRS-2676: Start of 'ora.crsd' on 'phaws1' succeeded
CRS-2676: Start of 'ora.crsd' on 'phaws2' succeeded

總結:crs異常主要是因為votedisk的無法訪問導致。主要還是要分析日誌,根據日誌得出正確的處理思路。

05-11 00:43