概述

在RAC环境下配置OGG,要想实现RAC节点故障时,OGG能自动的failover到正常节点,要保证两点:

1. OGG的checkpoint,trail,BR文件放置在共享的集群文件系统上,RAC各节点都能访问到

2. 需要有集群软件的来监测OGG进程,以及发生故障时,自动在正常节点重启OGG(failover)

Oracle Grid Infrastructure Standalone Agents (XAG)搭配Oracle支持的集群文件系统,可以实现OGG的自动failover,本文介绍相关的配置步骤。

组件及版本要求

要想使用XAG实现自动failover,相关软件的版本必须满足要求:

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

至于集群文件系统,Oracle官方文档给出的建议是ACFS,DBFS和OCFS,我觉得其他集群文件系统,比如Veritas 的集群文件系统应该也可以。

本文示例使用的是ACFS。

测试环境软件版本

源端数据库:11.2.0.4 RAC (ASM)

目标端数据库:12.1.0.2 RAC(ASM)

GoldenGate : 12.2.0.1.1

操作系统:源端和目标端都是Oracle Enterprise Linux 6.5 (64bit)

配置步骤

安装GI XAG

XAG需要单独去Oracle官网下载安装 ,下载位置是:http://www.oracle.com/technetwork/database/database-technologies/clusterware/downloads/index.html

目前的版本是7,文件是xagpack_7b.zip

解压缩文件,然后用GI的安装用户(一般是“grid”),执行xagsetup.sh进行安装:

[grid@rac1 xag]$ ./xagsetup.sh --install --directory /u01/app/grid/xaghome --all_nodes

Installing Oracle Grid Infrastructure Agents on: rac1

Installing Oracle Grid Infrastructure Agents on: rac2

Done.

在目标端也安装XAG,方法和源端相同。

源端(11.2)创建ACFS

11.2.0.4 在OEL上如果想用ACFS,必须安装PSU补丁到11.2.0.4.4以上。补丁过程略过。

使用ACFS的磁盘组的属性值COMPATIBLE.ASM和COMPATIBLE.ADVM必须设置为11.2 :

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

使用ASMCMD或ASMCA创建ACFS卷:

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

创建通用ACFS

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

此时ACFS还不是CRS管理的,可以使用ASMCMD的volinfo命令或/sbin/acfsutil registry查看ACFS信息

ASMCMD> volinfo -a

Diskgroup Name: DATA

Volume Name: VOLOGG1

Volume Device: /dev/asm/vologg1-426

State: ENABLED

Size (MB): 3072

Resize Unit (MB): 32

Redundancy: UNPROT

Stripe Columns: 4

Stripe Width (K): 128

Usage: ACFS

Mountpath: /u01/app/grid/acfsmounts/data_vol1

[root@rac1 ~]# /sbin/acfsutil registry

Mount Object:

Device: /dev/asm/vologg1-426

Mount Point: /u01/app/grid/acfsmounts/data_vol1

Disk Group: DATA

Volume: VOLOGG1

Options: none

Nodes: all

源端(11.2)将ACFS注册到CRS

首先从通用ACFS的注册信息中删除我们刚才创建的ACFS的条目

[root@rac1 ~]# /sbin/acfsutil registry -d /u01/app/grid/acfsmounts/data_vol1

acfsutil registry: successfully removed ACFS mount point /u01/app/grid/acfsmounts/data_vol1 from Oracle Registry

然后,用SRVCTL工具进行CRS资源注册:

[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl add filesystem -d /dev/asm/vologg1-426 -v VOLOGG1 -g DATA -m /u01/app/grid/acfsmounts/data_vol1 -u grid

[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATA.dg

ONLINE ONLINE rac1

ONLINE ONLINE rac2

ora.LISTENER.lsnr

ONLINE ONLINE rac1

ONLINE ONLINE rac2

ora.asm

ONLINE ONLINE rac1 Started

ONLINE ONLINE rac2 Started

ora.data.vologg1.acfs

OFFLINE OFFLINE rac1

OFFLINE OFFLINE rac2

ora.gsd

OFFLINE OFFLINE rac1

OFFLINE OFFLINE rac2

ora.net1.network

ONLINE ONLINE rac1

ONLINE ONLINE rac2

ora.ons

ONLINE ONLINE rac1

ONLINE ONLINE rac2

--------------------------------------------------------------------------------

手工启动资源,(mount ACFS)

[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl start filesystem -d /dev/asm/vologg1-426

[root@rac1 ~]#

[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATA.dg

ONLINE ONLINE rac1

ONLINE ONLINE rac2

ora.LISTENER.lsnr

ONLINE ONLINE rac1

ONLINE ONLINE rac2

ora.asm

ONLINE ONLINE rac1 Started

ONLINE ONLINE rac2 Started

ora.data.vologg1.acfs

ONLINE ONLINE rac1 mounted on /u01/app /grid/acfsmounts/dat a_vol1

ONLINE ONLINE rac2 mounted on /u01/app/grid/acfsmounts/dat a_vol1

[root@rac1 ~]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/vg_rac1-lv_root 45G 32G 12G 74% /

tmpfs 2.0G 437M 1.6G 23% /dev/shm

/dev/sda1 477M 55M 397M 13% /boot

/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1

[root@rac2 ~]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/vg_rac1-lv_root 45G 25G 19G 58% /

tmpfs 2.0G 440M 1.6G 23% /dev/shm

/dev/sda1 477M 55M 397M 13% /boot

/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1

目标端(12.1)创建ACFS及注册

12c创建ACFS和11g的主要区别是,没有了通用和数据库home用文件系统的选项,创建后会生成注册文件系统到CRS的脚本。

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

运行系统生成的脚本,完成注册及挂载:

[root@oel65vm11 scripts]# ./acfs_script.sh

ACFS file system /u01/app/grid/acfsmounts/ogg_vol1 is mounted on nodes oel65vm11,oel65vm12

查看资源信息:

[root@oel65vm11 bin]# ./crsctl status resource -t

--------------------------------------------------------------------------------

Name Target State Server State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATA.VOLOGG2.advm

ONLINE ONLINE oel65vm11 STABLE

ONLINE ONLINE oel65vm12 STABLE

ora.DATA.dg

ONLINE ONLINE oel65vm11 STABLE

ONLINE ONLINE oel65vm12 STABLE

ora.LISTENER.lsnr

ONLINE ONLINE oel65vm11 STABLE

ONLINE ONLINE oel65vm12 STABLE

ora.asm

ONLINE ONLINE oel65vm11 Started,STABLE

ONLINE ONLINE oel65vm12 Started,STABLE

ora.data.vologg2.acfs

ONLINE ONLINE oel65vm11 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE

ONLINE ONLINE oel65vm12 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE

ora.net1.network

ONLINE ONLINE oel65vm11 STABLE

ONLINE ONLINE oel65vm12 STABLE

ora.ons

ONLINE ONLINE oel65vm11 STABLE

ONLINE ONLINE oel65vm12 STABLE

注意,所有节点必须关掉SELINUX,否则会出现ACFS无权写入的错误。

安装Oracle GoldenGate

这个版本的ogg同时支持11g和12c的数据库,在图形界面安装时,用户可以选择对应不同数据库版本的ogg

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

将OGG安装在前面创建的ACFS上:

利用XAG在RAC环境下实现GoldenGate自动Failover-LMLPHP

源端的安装位置:/u01/app/grid/acfsmounts/data_vol1/ogg12

目标端的安装位置:/u01/app/grid/acfsmounts/ogg_vol1/ogg12

选择自动启动Manager进程。

数据库准备工作

l 变更源端数据库为归档模式,过程省略。

l 源端数据库增加相关日志及修改参数:

SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

Database altered.

SQL> ALTER DATABASE FORCE LOGGING;

Database altered.

SQL> SELECT supplemental_log_data_min, force_logging FROM v$database;

SUPPLEME FORCE_LOGGING

-------- ---------------------------------------

YES YES

SQL> ALTER SYSTEM SWITCH LOGFILE;

System altered.

SQL> alter system set ENABLE_GOLDENGATE_REPLICATION=true;

System altered.

l 在源端和目标端创建OGG数据库用户及授权,我的例子里创建的用户是GGADM。

OGG用户需要的权限可以参阅联机文档《Installing and Configuring Oracle GoldenGate for Oracle Database 12c (12.2.0.1)》中的章节 4.1.4.1 Oracle 11.2.0.4 or Later Database Privileges,我们这个测试为了方便,授予用户DBA角色,以及使用特定系统包的授权:

SQL> BEGIN

dbms_goldengate_auth.grant_admin_privilege

2 3 (

grantee => 'GGADM',

privilege_type => 'CAPTURE',

grant_select_privileges => TRUE

);

END;

/ 4 5 6 7 8 9

PL/SQL procedure successfully completed.

源端OGG设置

l 登录数据库:

GGSCI (rac1.hthorizontest.com) 1> dblogin userid ggadm password ggadm

Successfully logged into database.

l 注册集成式抽取

GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 2> register extract ext1 database;

2016-04-07 23:44:38 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 1291634.

l 增加抽取进程

GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 3> ADD EXTRACT ext1 INTEGRATED TRANLOG, BEGIN NOW

EXTRACT (Integrated) added.

GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 4> ADD EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et, EXTRACT ext1

EXTTRAIL added.

l 增加传送进程

GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 5> ADD EXTRACT pump1 EXTTRAILSOURCE /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et

EXTRACT added.

GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 6>EDIT PARAMS EXT1

加入下面内容:

EXTRACT ext1

USERID ggadm, PASSWORD ggadm

TRANLOGOPTIONS INTEGRATED PARAMS (MAX_SGA_SIZE 100)

EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et

TABLE test.*;

GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 7>EDIT PARAMS PUMP1

加入下面内容:

EXTRACT pump1

USERID ggadm, PASSWORD ggadm

RMTHOST 192.168.0.11, MGRPORT 7809

RMTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt

TABLE TEST.*;

然后启动所有进程。

在11.2.0.4版本,如果实现集成的capture模式,在启动抽取进程时,会提示需要安装补丁17030189,主要是因为使用集成的capture,需要修改数据字典表。

但是在安装了PSU后,有时会导致这个补丁和其他补丁冲突,也可以手工执行prvtlmpg.plb来解决问题。

(EXTRACT Abending With OGG-02912 (Doc ID 2091679.1))

目标端OGG设置

GGSCI (oel65vm11.hthorizon.com) 8> dblogin userid ggadm password ggadm

Successfully logged into database.

GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 9>ADD CHECKPOINTTABLE ggadm.checkpointtab

Successfully created checkpoint table ggadm.checkpointtab

GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 10> ADD REPLICAT rep1, EXTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt checkpointtable ggadm.checkpointtab

REPLICAT added.

GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 11>EDIT PARAMS REP1

加入下面内容:

REPLICAT rep1

USERID ggadm, PASSWORD ggadm

ASSUMETARGETDEFS

DISCARDFILE /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt, PURGE

MAP TEST.* TARGET TEST.*;

然后启动进程,测试OGG数据复制是否正常

修改OGG MGR参数

为了让OGG的Manager进程能够自动启动复制进程,需要将下列配置加进Manager的配置文件:

AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60

AUTOSTART ER *

重启Manager进程使之生效。

源端和目标端都要修改。

配置源端XAG

l 添加APP VIP(以root身份)

[root@rac1 ~]# /u01/app/11.2.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.36 -vipname=xag.gg_1-vip.vip -user=oracle

l 允许grid用户启动资源(以root身份)

[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x

l 启动VIP(以grid身份)

[root@rac1 ~]# su - grid

[grid@rac1 ~]$ /u01/app/11.2.0/grid/bin/crsctl start resource xag.gg_1-vip.vip

CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'rac1'

CRS-2676: Start of 'xag.gg_1-vip.vip' on 'rac1' succeeded

l 查看状态

[grid@rac1 ~]$ crsctl status resource xag.gg_1-vip.vip

NAME=xag.gg_1-vip.vip

TYPE=app.appvip_net1.type

TARGET=ONLINE

STATE=ONLINE on rac1

l 创建OGG对应的CRS资源(以root身份)

[root@rac1 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_1 --gg_home /u01/app/grid/acfsmounts/data_vol1/ogg12 --instance_type source --nodes rac1,rac2 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg1.acfs --databases ora.tdb.db --oracle_home /u01/app/oracle/product/11.2.0/dbhome_1 --monitor_extracts ext1,pump1

[root@rac1 ~]# cd /u01/app/grid/xaghome/bin

[root@rac1 bin]# ./agctl status goldengate gg_1

Goldengate instance 'gg_1' is not running

l 授权grid启动资源

上面的命令执行完毕,会自动创建一个对应ogg的CRS资源,需要授权grid有权管理它:

[root@oel65vm11 bin]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1.goldengate -u user:grid:r-x

配置目标端XAG

过程和源端类似,

l 创建VIP资源:

[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.26 -vipname=xag.gg_1-vip.vip -user=oracle

[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x

[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl start resource xag.gg_1-vip.vip

CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm12'

CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded

[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl relocate resource xag.gg_1-vip.vip -n oel65vm11

CRS-2673: Attempting to stop 'xag.gg_1-vip.vip' on 'oel65vm12'

CRS-2677: Stop of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded

CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm11'

CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm11' succeeded

l 创建ogg 对应的CRS资源

[root@oel65vm11 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_2 --gg_home /u01/app/grid/acfsmounts/ogg_vol1/ogg12 --instance_type target --nodes oel65vm11,oel65vm12 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg2.acfs --databases ora.racdb.db --oracle_home /u01/app/oracle/product/12.1.0/dbhome_1 --monitor_replicats rep1

l 授权

[root@oel65vm11 bin]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_2.goldengate -u user:grid:r-x

修改PUMP进程

将PUMP进程对应的源端地址修改为我们刚才创建的VIP

RMTHOST 192.168.0.26, MGRPORT 7809

重启PUMP进程

启动CRS OGG资源

进入ggsci命令行,将源端和目标段进程都停掉

l 启动目标端资源

[grid@oel65vm11 ~]$ cd $ORACLE_BASE

[grid@oel65vm11 grid]$ cd xaghome/bin

[grid@oel65vm11 bin]$ ./agctl start goldengate gg_2 --node oel65vm11

[grid@oel65vm11 bin]$ crsctl status resource xag.gg_2.goldengate

NAME=xag.gg_2.goldengate

TYPE=xag.goldengate.type

TARGET=ONLINE

STATE=ONLINE on oel65vm11

l 启动源端资源

[grid@rac1 bin]$ cd $ORACLE_BASE

[grid@rac1 grid]$ cd xaghome/bin

[grid@rac1 bin]$ ./agctl start goldengate gg_1 --node rac1

[grid@rac1 bin]$ crsctl status resource xag.gg_1.goldengate

NAME=xag.gg_1.goldengate

TYPE=xag.goldengate.type

TARGET=ONLINE

STATE=ONLINE on rac1

启动后,进入GGSCI命令行,查看进程状态,如果进程都自动启动了,说明配置没有问题。

切换测试

使用命令测试源端切换:

[grid@rac1 bin]$ ./agctl relocate goldengate gg_1 --node rac2

[grid@rac1 bin]$ crsctl status resource –t

。。。。。。

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

。。。。。。

xag.gg_1-vip.vip

1 ONLINE ONLINE rac2

xag.gg_1.goldengate

1 ONLINE ONLINE rac2

再做一个切断电源的测试,我们以“关掉电源”的方式关闭目标端的主机oel65vm11

在主机oel65vm12上,可以看到RAC的vip failover到了本节点,ogg的vip和gg_2对应的资源也自动failover到了本节点:

[grid@oel65vm12 ~]$ crsctl status resource -t

。。。。。。

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

。。。。。。

ora.oel65vm11.vip

1 ONLINE INTERMEDIATE oel65vm12 FAILED OVER,STABLE

ora.oel65vm12.vip

1 ONLINE ONLINE oel65vm12 STABLE

ora.racdb.db

1 ONLINE OFFLINE STABLE

2 ONLINE ONLINE oel65vm12 Open,STABLE

ora.scan1.vip

1 ONLINE ONLINE oel65vm12 STABLE

xag.gg_1-vip.vip

1 ONLINE ONLINE oel65vm12 STABLE

xag.gg_2.goldengate

1 ONLINE ONLINE oel65vm12 STABLE

上面只是一个最简单的例子,没有考虑各种复杂的情况,例如,同时部署有监控jagent,或者downstream复制等等,所以现实的生产环境往往比这个例子复杂得多。

04-23 14:07