1.什么是ONS

ONS(Oracle Notification Service)是Oracle Clusterware 实现FAN Event Push模型的基础。
     在传统模型中,客户端需要定期检索服务器来判断服务端的状态,本质上是一个PULL模型。ORACLE10
引入了一种全新的PUSH机制--FAN(Fast Application Notification),当服务端发生某些事件时,服务器
会主动的通知客户端这种变化,这样客户端就能尽早得知服务器端变化。而这种机制就是依赖ONS实现的。
通常使用onsctl命令管理配置ONS,使用onsctl命令之前,需要先配置ONS服务。

2.OSN配置内容

需要注意的是在RAC环境中,使用的是$CRS_HOME下的ONS,而不是$ORACLE_HOME下的ONS。
配置文件位于$CRS_HOME/opmn/conf/ons.config。

[root@rac3 conf]# pwd
/opt/ora10g/product/10.2./crs_1/opmn/conf
[root@rac3 conf]# ls
ons.config
[root@rac3 conf]# cat ons.config
localport=
remoteport=
loglevel=
useocr=on

我们对这个文件的参数进行说明:

<1>localport:这个参数代表本地监听端口,这里的"本地"特指127.0.0.1这个回环地址,用来和运行在本地的客户端进行通信。
<2>remoteport:这个参数代表的远程监听端口,也就是除了127.0.0.1以外的所有本机IP地址,用来和远程的客户端进行通信。
<3>loglevel:Oracle允许跟踪ONS进程的运行,并把日志记录到本地文件中。这个参数用来定义ONS进程要记录的日志级别,    从1~9,缺省值为3。
<4>logfile:这个参数和loglevel参数一起使用,用于定义ONS进程日志文件的位置,缺省是 $CRS_HOME/opmn/logs/opmn.log。
<5>nodes和useocr:这两个参数共同决定了本机的ONS daemon要和哪些节点上的ONS daemon进行通信。

在这些参数中,localport和remoteport两个参数是必须的。可以通过netstat命令来比较一下这两个端口的使用方式:

[root@rac3 bin]# netstat -ano|grep
tcp 127.0.0.1: 0.0.0.0:* LISTEN off (0.00//)
tcp 127.0.0.1: 127.0.0.1: ESTABLISHED off (0.00//)
tcp 127.0.0.1: 127.0.0.1: ESTABLISHED keepalive (7063.32//)
tcp 127.0.0.1: 127.0.0.1: ESTABLISHED keepalive (7188.42//)
tcp 127.0.0.1: 127.0.0.1: ESTABLISHED off (0.00//)
udp 192.168.2.103: 0.0.0.0:* off (0.00//)/) [root@rac3 bin]# netstat -ano|grep
tcp 0.0.0.0: 0.0.0.0:* LISTEN off (0.00//)
tcp 192.168.1.103: 192.168.1.104: ESTABLISHED off (0.00//)

对比可以看到Oracle在127.0.0.1这个地址上监听6100这个端口,而在0.0.0.0(即所其他地址)上监听6200端口,这正好对应了我们/opt/ora10g/product/10.2.0/crs_1/opmn/conf/ons.config中的配置

在这里还需要注意的是useocr参数,该参数取值为ON或OFF。如果useocr是ON,说明与ONS进行通信的远程节点信息就保存在OCR中,如果是OFF,说明与ONS进行通信的远程节点信息就取nodes中的配置。
   nodes参数值格式: hostname/ip:port[,hostname/ip:port]  例如:nodes=dbs:6200,dbp:6200
  
当useocr参数为ON时,与ONS进行通信的远程节点信息就保存在OCR中,那么这个信息就保存在OCR的DATABASE.ONS_HOSTS这个键下。

我们可以把这个键导出来:

[root@rac3 bin]# ./ocrdump -xml /home/oracle/ons_info.xml -keyname DATABASE.ONS_HOSTS
[root@rac3 bin]# cat /home/oracle/ons_info.xml
<OCRDUMP> <TIMESTAMP>// ::</TIMESTAMP>
<COMMAND>./ocrdump.bin -xml /home/oracle/ons_info.xml -keyname DATABASE.ONS_HOSTS </COMMAND> <KEY>
<NAME>DATABASE.ONS_HOSTS</NAME>
<VALUE_TYPE>UNDEF</VALUE_TYPE>
<VALUE><![CDATA[]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac3</NAME> --节点
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[rac3]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME> --节点对应的端口
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac4</NAME> --节点
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[rac4]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME> --端口
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> </KEY> </OCRDUMP>

3.配置ONS

配置ONS时我们可以直接编辑ONS的配置文件来修改配置(useocr=OFF时),如果ONS节点通信的配置信息放在了OCR中(useocr=ON时),可以使用root身份执行racgons命令进行配置。

注意:racgons命令必须用root身份执行,如果使用oracle身份执行这个命令,不会提示任何错误信息,但是也不会更改任何配置。

---添加配置:

[root@rac3 bin]# ./racgons add_config rac3: rac4:
[root@rac3 bin]# ./ocrdump -xml /home/oracle/ons_info2.xml -keyname DATABASE.ONS_HOSTS
[root@rac3 bin]# cat /home/oracle/ons_info2.xml
<OCRDUMP> <TIMESTAMP>// ::</TIMESTAMP>
<COMMAND>./ocrdump.bin -xml /home/oracle/ons_info2.xml -keyname DATABASE.ONS_HOSTS </COMMAND> <KEY>
<NAME>DATABASE.ONS_HOSTS</NAME>
<VALUE_TYPE>UNDEF</VALUE_TYPE>
<VALUE><![CDATA[]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac3</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[rac3]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[ ]]></VALUE> --可以看到增加了6300端口
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac4</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[rac4]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[ ]]></VALUE> --可以看到增加了6300端口
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> </KEY> </OCRDUMP>

----删除配置

[root@rac3 bin]# ./racgons remove_config rac3: rac4:
racgons: Existing key value on rac3 = .
racgons: rac3: removed from OCR.
racgons: Existing key value on rac4 = .
racgons: rac4: removed from OCR.
[root@rac3 bin]# ./ocrdump -xml /home/oracle/ons_info3.xml -keyname DATABASE.ONS_HOSTS
[root@rac3 bin]# cat /home/oracle/ons_info3.xml
<OCRDUMP> <TIMESTAMP>// ::</TIMESTAMP>
<COMMAND>./ocrdump.bin -xml /home/oracle/ons_info3.xml -keyname DATABASE.ONS_HOSTS </COMMAND> <KEY>
<NAME>DATABASE.ONS_HOSTS</NAME>
<VALUE_TYPE>UNDEF</VALUE_TYPE>
<VALUE><![CDATA[]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac3</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[rac3]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac3.PORT</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[ ]]></VALUE> --可以看到6300端口已被删除
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac4</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[rac4]]></VALUE>
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> <KEY>
<NAME>DATABASE.ONS_HOSTS.rac4.PORT</NAME>
<VALUE_TYPE>ORATEXT</VALUE_TYPE>
<VALUE><![CDATA[ ]]></VALUE> --可以看到6300端口已被删除
<USER_PERMISSION>PROCR_ALL_ACCESS</USER_PERMISSION>
<GROUP_PERMISSION>PROCR_READ</GROUP_PERMISSION>
<OTHER_PERMISSION>PROCR_READ</OTHER_PERMISSION>
<USER_NAME>oracle</USER_NAME>
<GROUP_NAME>oinstall</GROUP_NAME> </KEY> </KEY> </KEY> </OCRDUMP>

4.onsctl命令

用onsctl命令可以启动、停止、调试ONS,并重新载入配置文件,其命令格式如下:

[root@rac3 bin]# ./onsctl -help
usage: ./onsctl start|stop|ping|reconfig|debug start - Start opmn only.
stop - Stop ons daemon
ping - Test to see if ons daemon is running
debug - Display debug information for the ons daemon
reconfig - Reload the ons configuration
help - Print a short syntax description (this).
detailed - Print a verbose syntax description.

注意:ONS进程运行,并不一定代表ONS正常工作,需要使用ping命令来确认。

<1>在OS级别查看进程状态

 [root@rac3 bin]# ps -ef|grep ons |grep -v grep
oracle : ? :: /opt/ora10g/product/10.2./crs_1/opmn/bin/ons -d
oracle : ? :: /opt/ora10g/product/10.2./crs_1/opmn/bin/ons -d

从输出信息可见ONS进程正常运行。

<2>确认ONS服务状态

 [root@rac3 bin]# ./onsctl ping
Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
onscfg[]
{node = rac4, port = }
Adding remote host rac4:
ons is running ...

从输出信息可见ONS进程正常运行。

<3>停止ons服务

 [root@rac3 bin]# ./onsctl stop
onsctl: shutting down ons daemon ...
Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
onscfg[]
{node = rac4, port = }
Adding remote host rac4:
[root@rac3 bin]#
[root@rac3 bin]# ./onsctl ping
Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
onscfg[]
{node = rac4, port = }
Adding remote host rac4:
ons is not running ... ---从这里看确认停止成功

<4>启动ons服务

[root@rac3 bin]# ./onsctl start
Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
onscfg[]
{node = rac4, port = }
Adding remote host rac4:
Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
onscfg[]
{node = rac4, port = }
Adding remote host rac4:
onsctl: ons started --启动成功
[root@rac3 bin]#
[root@rac3 bin]# ./onsctl ping
Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
onscfg[]
{node = rac4, port = }
Adding remote host rac4:
ons is running ... --从这里看确认启动成功

<5>使用debug选项查看详细信息

[root@rac3 bin]# ./onsctl debug
Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
onscfg[]
{node = rac4, port = }
Adding remote host rac4:
HTTP/1.1 OK
Content-Length:
Content-Type: text/html
Response: ======== ONS ======== Listeners: NAME BIND ADDRESS PORT FLAGS SOCKET
------- --------------- ----- -------- ------
Local 127.000.000.001
Remote 192.168.001.103
Request No listener Server connections: -----该命令最有意义的是能够显示所有连接。

ID IP PORT FLAGS SENDQ WORKER BUSY SUBS
---------- --------------- ----- -------- ---------- -------- ------ -----
192.168.001.104 Client connections: ID IP PORT FLAGS SENDQ WORKER BUSY SUBS
---------- --------------- ----- -------- ---------- -------- ------ ----- Pending connections: ID IP PORT FLAGS SENDQ WORKER BUSY SUBS
---------- --------------- ----- -------- ---------- -------- ------ -----
127.000.000.001
127.000.000.001
127.000.000.001 Worker Ticket: /, Idle: THREAD FLAGS
-------- --------
f7f86ba0
f6dd1ba0
f63d0ba0 Resources: Notifications:
Received: , in Receive Q: , Processed: , in Process Q: Pools:
Message: / (), Link: / (), Subscription: / ()

##===========================================================

延伸:

在对以上ons进行配置测试后,使用crs_stat -t 命令发现集群中一个节点 ons启动不起来

[oracle@rac3 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE rac3
ora....C3.lsnr application ONLINE ONLINE rac3
ora.rac3.gsd application ONLINE ONLINE rac3
ora.rac3.ons application ONLINE OFFLINE
ora.rac3.vip application ONLINE ONLINE rac3
ora....SM2.asm application ONLINE ONLINE rac4
ora....C4.lsnr application ONLINE ONLINE rac4
ora.rac4.gsd application ONLINE ONLINE rac4
ora.rac4.ons application ONLINE ONLINE rac4
ora.rac4.vip application ONLINE ONLINE rac4
ora.racdb.db application ONLINE ONLINE rac4
ora....b1.inst application ONLINE ONLINE rac3
ora....b2.inst application ONLINE ONLINE rac4

--查看日志

[oracle@rac3 racg]$ tail -f ora.rac3.ons.log

..........................................
RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files RCV: Permission denied
Communication error with the OPMN server loca
-- ::25.867: [ RACG][] [][][ora.rac3.ons]: l port.
Check the OPMN log files RCV: Permission denied -----一直提示权限被拒绝
Communication error with the OPMN server local port.
Check the OPMN log files Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
o
-- ::25.867: [ RACG][] [][][ora.rac3.ons]: nscfg[]
{node = rac4, port = }
Adding remote host rac4:
onsctl: ons failed to start --导致ons启动失败,但onsctl ping显示ons正在运行
-- ::26.077: [ RACG][] [][][ora.rac3.ons]: RCV: Permission denied
Communication error with the OPMN server local port.
Check the OPMN log files

--但是确认ons服务已启动

[root@rac3 bin]# ./onsctl ping
Number of onsconfiguration retrieved, numcfg =
onscfg[]
{node = rac3, port = }
Adding remote host rac3:
onscfg[]
{node = rac4, port =
-- ::26.077: [ RACG][] [][][ora.rac3.ons]: }
Adding remote host rac4:
ons is not running ...

重新./onsctl stop 后 ./onsctl start也可以正常关闭和启动,但日志里看到的都是启动不起来

--单独启动的时候

[oracle@rac3 ~]$ crs_start ora.rac3.ons
Attempting to start `ora.rac1.ons` on member `rac3`
Start of `ora.rac3.ons` on member `rac3` failed.
rac4 : CRS-: Resource ora.rac3.ons (application) cannot run on rac4

验证了ons的配置权限也没有发现问题,重启了虚拟机尝试,发现ons在两个节点正常启动,问题解决。
 现在怀疑可能是权限问题没有检查到或ons进程僵死,启动新的能够启动,日志里还是报错信息。
(一般情况下,暂时的关闭和启动ons资源对系统影响不是太大,因为该资源主要和load balance 、 failover 有关)

[oracle@rac3 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE rac3
ora....C3.lsnr application ONLINE ONLINE rac3
ora.rac3.gsd application ONLINE ONLINE rac3
ora.rac3.ons application ONLINE ONLINE rac3
ora.rac3.vip application ONLINE ONLINE rac3
ora....SM2.asm application ONLINE ONLINE rac4
ora....C4.lsnr application ONLINE ONLINE rac4
ora.rac4.gsd application ONLINE ONLINE rac4
ora.rac4.ons application ONLINE ONLINE rac4
ora.rac4.vip application ONLINE ONLINE rac4
ora.racdb.db application ONLINE ONLINE rac4
ora....b1.inst application ONLINE ONLINE rac3
ora....b2.inst application ONLINE ONLINE rac4

类似问题itpub上的帖子:http://www.itpub.net/thread-1283253-1-1.html

ps -ef|grep ons

致谢:本文档参考了张晓明<<大话Oracle RAC>>

05-02 21:30