Oracle 10.2.0.4 RAC aix 7.1 主机重启后,有个节点没正常启动,ocssd.log提示以下信息:

[    CSSD]2020-11-22 02:50:21.750 [1544] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(2758) LATS(7664721) Disk lastSeqNo(2758)
[    CSSD]2020-11-22 02:50:22.752 [1030] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(2759) LATS(7665724) Disk lastSeqNo(2759)
[    CSSD]2020-11-22 02:50:22.752 [1287] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(2759) LATS(7665724) Disk lastSeqNo(2759)
[    CSSD]2020-11-22 02:50:22.753 [1544] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(2759) LATS(7665724) Disk lastSeqNo(2759)
[    CSSD]2020-11-22 02:50:23.527 [4628] >TRACE:   clssnmRcfgMgrThread: Local Join
[    CSSD]2020-11-22 02:50:23.527 [4628] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[    CSSD]2020-11-22 02:50:23.755 [1287] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(2760) LATS(7666726) Disk lastSeqNo(2760)
[    CSSD]2020-11-22 02:50:23.755 [1030] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(2760) LATS(7666727) Disk lastSeqNo(2760)
[    CSSD]2020-11-22 02:50:23.755 [1544] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(2760) LATS(7666727) Disk lastSeqNo(2760)
[    CSSD]2020-11-22 02:50:24.757 [1030] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(2761) LATS(7667729) Disk lastSeqNo(2761)

只是主机重启,各项配置都没有改动,当时私网有点问题不通。

参考:
CRS can not Start After Node Reboot (Doc ID 733260.1)

CHANGES

This can happen in an environment where a node is shutdown for various reasons, then restarted.

CAUSE


1. During reboot, CRS is started automatically before the network interface is ready.
2. /etc/hosts mismatch, wrong definition for the problem node
3. The private network IP has been changed, but /etc/hosts reflects the changes in a wrong way
4. Private network is not pingable or ping response is slow, there is packet loss from ping command
5. Different clusterware used for different nodes
6. If /etc/init.d/init.cssd startcheck does not complete, usually /tmp/crsctl.xxx file should give the clue as to why it does not complete. In case there is no /tmp/crsctl.xxx file generated
7. OCR is pointing to a wrong device
8. localconfig has been run on cluster node accidentally
9. If CRS does not start automatically after node reboot, please check if auto start is disable by:
cat /etc/oracle/scls_scr//root/crsstart

私网正常后,重启crs解决:
$CRS_HOME/bin/crsctl stop crs
$CRS_HOME/bin/crsctl start crs
12-07 02:35
查看更多