概述
在RAC环境下配置OGG,要想实现RAC节点故障时,OGG能自动的failover到正常节点,要保证两点:
1. OGG的checkpoint,trail,BR文件放置在共享的集群文件系统上,RAC各节点都能访问到
2. 需要有集群软件的来监测OGG进程,以及发生故障时,自动在正常节点重启OGG(failover)
Oracle Grid Infrastructure Standalone Agents (XAG)搭配Oracle支持的集群文件系统,可以实现OGG的自动failover,本文介绍相关的配置步骤。
组件及版本要求
要想使用XAG实现自动failover,相关软件的版本必须满足要求:
至于集群文件系统,Oracle官方文档给出的建议是ACFS,DBFS和OCFS,我觉得其他集群文件系统,比如Veritas 的集群文件系统应该也可以。
本文示例使用的是ACFS。
测试环境软件版本
源端数据库:11.2.0.4 RAC (ASM)
目标端数据库:12.1.0.2 RAC(ASM)
GoldenGate : 12.2.0.1.1
操作系统:源端和目标端都是Oracle Enterprise Linux 6.5 (64bit)
配置步骤
安装GI XAG
XAG需要单独去Oracle官网下载安装 ,下载位置是:http://www.oracle.com/technetwork/database/database-technologies/clusterware/downloads/index.html
目前的版本是7,文件是xagpack_7b.zip
解压缩文件,然后用GI的安装用户(一般是“grid”),执行xagsetup.sh进行安装:
[grid@rac1 xag]$ ./xagsetup.sh --install --directory /u01/app/grid/xaghome --all_nodes
Installing Oracle Grid Infrastructure Agents on: rac1
Installing Oracle Grid Infrastructure Agents on: rac2
Done.
在目标端也安装XAG,方法和源端相同。
源端(11.2)创建ACFS
11.2.0.4 在OEL上如果想用ACFS,必须安装PSU补丁到11.2.0.4.4以上。补丁过程略过。
使用ACFS的磁盘组的属性值COMPATIBLE.ASM和COMPATIBLE.ADVM必须设置为11.2 :
使用ASMCMD或ASMCA创建ACFS卷:
创建通用ACFS
此时ACFS还不是CRS管理的,可以使用ASMCMD的volinfo命令或/sbin/acfsutil registry查看ACFS信息
ASMCMD> volinfo -a
Diskgroup Name: DATA
Volume Name: VOLOGG1
Volume Device: /dev/asm/vologg1-426
State: ENABLED
Size (MB): 3072
Resize Unit (MB): 32
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /u01/app/grid/acfsmounts/data_vol1
[root@rac1 ~]# /sbin/acfsutil registry
Mount Object:
Device: /dev/asm/vologg1-426
Mount Point: /u01/app/grid/acfsmounts/data_vol1
Disk Group: DATA
Volume: VOLOGG1
Options: none
Nodes: all
源端(11.2)将ACFS注册到CRS
首先从通用ACFS的注册信息中删除我们刚才创建的ACFS的条目
[root@rac1 ~]# /sbin/acfsutil registry -d /u01/app/grid/acfsmounts/data_vol1
acfsutil registry: successfully removed ACFS mount point /u01/app/grid/acfsmounts/data_vol1 from Oracle Registry
然后,用SRVCTL工具进行CRS资源注册:
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl add filesystem -d /dev/asm/vologg1-426 -v VOLOGG1 -g DATA -m /u01/app/grid/acfsmounts/data_vol1 -u grid
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
手工启动资源,(mount ACFS)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl start filesystem -d /dev/asm/vologg1-426
[root@rac1 ~]#
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
ONLINE ONLINE rac1 mounted on /u01/app /grid/acfsmounts/dat a_vol1
ONLINE ONLINE rac2 mounted on /u01/app/grid/acfsmounts/dat a_vol1
[root@rac1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 32G 12G 74% /
tmpfs 2.0G 437M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
[root@rac2 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 25G 19G 58% /
tmpfs 2.0G 440M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
目标端(12.1)创建ACFS及注册
12c创建ACFS和11g的主要区别是,没有了通用和数据库home用文件系统的选项,创建后会生成注册文件系统到CRS的脚本。
运行系统生成的脚本,完成注册及挂载:
[root@oel65vm11 scripts]# ./acfs_script.sh
ACFS file system /u01/app/grid/acfsmounts/ogg_vol1 is mounted on nodes oel65vm11,oel65vm12
查看资源信息:
[root@oel65vm11 bin]# ./crsctl status resource -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.VOLOGG2.advm
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.DATA.dg
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.asm
ONLINE ONLINE oel65vm11 Started,STABLE
ONLINE ONLINE oel65vm12 Started,STABLE
ora.data.vologg2.acfs
ONLINE ONLINE oel65vm11 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ONLINE ONLINE oel65vm12 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ora.net1.network
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.ons
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
注意,所有节点必须关掉SELINUX,否则会出现ACFS无权写入的错误。
安装Oracle GoldenGate
这个版本的ogg同时支持11g和12c的数据库,在图形界面安装时,用户可以选择对应不同数据库版本的ogg
将OGG安装在前面创建的ACFS上:
源端的安装位置:/u01/app/grid/acfsmounts/data_vol1/ogg12
目标端的安装位置:/u01/app/grid/acfsmounts/ogg_vol1/ogg12
选择自动启动Manager进程。
数据库准备工作
l 变更源端数据库为归档模式,过程省略。
l 源端数据库增加相关日志及修改参数:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Database altered.
SQL> ALTER DATABASE FORCE LOGGING;
Database altered.
SQL> SELECT supplemental_log_data_min, force_logging FROM v$database;
SUPPLEME FORCE_LOGGING
-------- ---------------------------------------
YES YES
SQL> ALTER SYSTEM SWITCH LOGFILE;
System altered.
SQL> alter system set ENABLE_GOLDENGATE_REPLICATION=true;
System altered.
l 在源端和目标端创建OGG数据库用户及授权,我的例子里创建的用户是GGADM。
OGG用户需要的权限可以参阅联机文档《Installing and Configuring Oracle GoldenGate for Oracle Database 12c (12.2.0.1)》中的章节 4.1.4.1 Oracle 11.2.0.4 or Later Database Privileges,我们这个测试为了方便,授予用户DBA角色,以及使用特定系统包的授权:
SQL> BEGIN
dbms_goldengate_auth.grant_admin_privilege
2 3 (
grantee => 'GGADM',
privilege_type => 'CAPTURE',
grant_select_privileges => TRUE
);
END;
/ 4 5 6 7 8 9
PL/SQL procedure successfully completed.
源端OGG设置
l 登录数据库:
GGSCI (rac1.hthorizontest.com) 1> dblogin userid ggadm password ggadm
Successfully logged into database.
l 注册集成式抽取
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 2> register extract ext1 database;
2016-04-07 23:44:38 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 1291634.
l 增加抽取进程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 3> ADD EXTRACT ext1 INTEGRATED TRANLOG, BEGIN NOW
EXTRACT (Integrated) added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 4> ADD EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et, EXTRACT ext1
EXTTRAIL added.
l 增加传送进程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 5> ADD EXTRACT pump1 EXTTRAILSOURCE /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
EXTRACT added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 6>EDIT PARAMS EXT1
加入下面内容:
EXTRACT ext1
USERID ggadm, PASSWORD ggadm
TRANLOGOPTIONS INTEGRATED PARAMS (MAX_SGA_SIZE 100)
EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
TABLE test.*;
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 7>EDIT PARAMS PUMP1
加入下面内容:
EXTRACT pump1
USERID ggadm, PASSWORD ggadm
RMTHOST 192.168.0.11, MGRPORT 7809
RMTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt
TABLE TEST.*;
然后启动所有进程。
在11.2.0.4版本,如果实现集成的capture模式,在启动抽取进程时,会提示需要安装补丁17030189,主要是因为使用集成的capture,需要修改数据字典表。
但是在安装了PSU后,有时会导致这个补丁和其他补丁冲突,也可以手工执行prvtlmpg.plb来解决问题。
(EXTRACT Abending With OGG-02912 (Doc ID 2091679.1))
目标端OGG设置
GGSCI (oel65vm11.hthorizon.com) 8> dblogin userid ggadm password ggadm
Successfully logged into database.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 9>ADD CHECKPOINTTABLE ggadm.checkpointtab
Successfully created checkpoint table ggadm.checkpointtab
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 10> ADD REPLICAT rep1, EXTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt checkpointtable ggadm.checkpointtab
REPLICAT added.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 11>EDIT PARAMS REP1
加入下面内容:
REPLICAT rep1
USERID ggadm, PASSWORD ggadm
ASSUMETARGETDEFS
DISCARDFILE /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt, PURGE
MAP TEST.* TARGET TEST.*;
然后启动进程,测试OGG数据复制是否正常
修改OGG MGR参数
为了让OGG的Manager进程能够自动启动复制进程,需要将下列配置加进Manager的配置文件:
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *
重启Manager进程使之生效。
源端和目标端都要修改。
配置源端XAG
l 添加APP VIP(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.36 -vipname=xag.gg_1-vip.vip -user=oracle
l 允许grid用户启动资源(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
l 启动VIP(以grid身份)
[root@rac1 ~]# su - grid
[grid@rac1 ~]$ /u01/app/11.2.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'rac1'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'rac1' succeeded
l 查看状态
[grid@rac1 ~]$ crsctl status resource xag.gg_1-vip.vip
NAME=xag.gg_1-vip.vip
TYPE=app.appvip_net1.type
TARGET=ONLINE
STATE=ONLINE on rac1
l 创建OGG对应的CRS资源(以root身份)
[root@rac1 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_1 --gg_home /u01/app/grid/acfsmounts/data_vol1/ogg12 --instance_type source --nodes rac1,rac2 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg1.acfs --databases ora.tdb.db --oracle_home /u01/app/oracle/product/11.2.0/dbhome_1 --monitor_extracts ext1,pump1
[root@rac1 ~]# cd /u01/app/grid/xaghome/bin
[root@rac1 bin]# ./agctl status goldengate gg_1
Goldengate instance 'gg_1' is not running
l 授权grid启动资源
上面的命令执行完毕,会自动创建一个对应ogg的CRS资源,需要授权grid有权管理它:
[root@oel65vm11 bin]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1.goldengate -u user:grid:r-x
配置目标端XAG
过程和源端类似,
l 创建VIP资源:
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.26 -vipname=xag.gg_1-vip.vip -user=oracle
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl relocate resource xag.gg_1-vip.vip -n oel65vm11
CRS-2673: Attempting to stop 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2677: Stop of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm11'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm11' succeeded
l 创建ogg 对应的CRS资源
[root@oel65vm11 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_2 --gg_home /u01/app/grid/acfsmounts/ogg_vol1/ogg12 --instance_type target --nodes oel65vm11,oel65vm12 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg2.acfs --databases ora.racdb.db --oracle_home /u01/app/oracle/product/12.1.0/dbhome_1 --monitor_replicats rep1
l 授权
[root@oel65vm11 bin]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_2.goldengate -u user:grid:r-x
修改PUMP进程
将PUMP进程对应的源端地址修改为我们刚才创建的VIP
RMTHOST 192.168.0.26, MGRPORT 7809
重启PUMP进程
启动CRS OGG资源
进入ggsci命令行,将源端和目标段进程都停掉
l 启动目标端资源
[grid@oel65vm11 ~]$ cd $ORACLE_BASE
[grid@oel65vm11 grid]$ cd xaghome/bin
[grid@oel65vm11 bin]$ ./agctl start goldengate gg_2 --node oel65vm11
[grid@oel65vm11 bin]$ crsctl status resource xag.gg_2.goldengate
NAME=xag.gg_2.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on oel65vm11
l 启动源端资源
[grid@rac1 bin]$ cd $ORACLE_BASE
[grid@rac1 grid]$ cd xaghome/bin
[grid@rac1 bin]$ ./agctl start goldengate gg_1 --node rac1
[grid@rac1 bin]$ crsctl status resource xag.gg_1.goldengate
NAME=xag.gg_1.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on rac1
启动后,进入GGSCI命令行,查看进程状态,如果进程都自动启动了,说明配置没有问题。
切换测试
使用命令测试源端切换:
[grid@rac1 bin]$ ./agctl relocate goldengate gg_1 --node rac2
[grid@rac1 bin]$ crsctl status resource –t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
xag.gg_1-vip.vip
1 ONLINE ONLINE rac2
xag.gg_1.goldengate
1 ONLINE ONLINE rac2
再做一个切断电源的测试,我们以“关掉电源”的方式关闭目标端的主机oel65vm11
在主机oel65vm12上,可以看到RAC的vip failover到了本节点,ogg的vip和gg_2对应的资源也自动failover到了本节点:
[grid@oel65vm12 ~]$ crsctl status resource -t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
ora.oel65vm11.vip
1 ONLINE INTERMEDIATE oel65vm12 FAILED OVER,STABLE
ora.oel65vm12.vip
1 ONLINE ONLINE oel65vm12 STABLE
ora.racdb.db
1 ONLINE OFFLINE STABLE
2 ONLINE ONLINE oel65vm12 Open,STABLE
ora.scan1.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_1-vip.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_2.goldengate
1 ONLINE ONLINE oel65vm12 STABLE
上面只是一个最简单的例子,没有考虑各种复杂的情况,例如,同时部署有监控jagent,或者downstream复制等等,所以现实的生产环境往往比这个例子复杂得多。