这是做运维以来的第一篇日志。平时都是记录在笔记里,以后尝试记录在这里吧,做个整理效果会更好。
给自己定个小目标,以后一周更新两次吧~
我的环境是Redhat 7.2+ Oracle RAC 11204,本来系统已经运行了一段时间了,今天登陆无意间发现节点2的示例down了,而所有的 crs服务都很正常。
于是查看节点2的alert 日志:
vi /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/alert_rac1122.log
- Mon Jul 13 11:05:48 2020
- Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_dbw4_28544.trc:
- ORA-27157: OS post/wait facility removed
- ORA-27300: OS system dependent operation:semop failed with status: 43
- ORA-27301: OS failure message: Identifier removed
- ORA-27302: failure occurred at: sskgpwwait1
- Mon Jul 13 11:05:48 2020
- Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_o000_295057.trc:
- ORA-27157: OS post/wait facility removed
- ORA-27300: OS system dependent operation:semop failed with status: 43
- ORA-27301: OS failure message: Identifier removed
- ORA-27302: failure occurred at: sskgpwwait1
- DBW4 (ospid: 28544): terminating the instance due to error 27157
- Mon Jul 13 11:05:48 2020
- Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_j001_295484.trc:
- ORA-27157: OS post/wait facility removed
- ORA-27300: OS system dependent operation:semop failed with status: 43
- ORA-27301: OS failure message: Identifier removed
- ORA-27302: failure occurred at: sskgpwwait1
- Mon Jul 13 11:05:48 2020
- System state dump requested by (instance=2, osid=28544 (DBW4)), summary=[abnormal instance termination].
- System State dumped to trace file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_diag_28495_20200713110548.trc
- Dumping diagnostic data in directory=[cdmp_20200713110548], requested by (instance=2, osid=28544 (DBW4)), summary=[abnormal instance termination].
- Instance terminated by DBW4, pid = 28544
- Errors in file /u01/app/oracle/diag/rdbms/rac112/rac1122/trace/rac1122_dbw4_28544.trc:
- ORA-27300: OS system dependent operation:semctl failed with status: 22
- ORA-27301: OS failure message: Invalid argument
- ORA-27302: failure occurred at: sskgpwrm1
- ORA-27157: OS post/wait facility removed
- ORA-27300: OS system dependent operation:semop failed with status: 43
- ORA-27301: OS failure message: Identifier removed
- ORA-27302: failure occurred at: sskgpwwait1
- Mon Jul 13 11:05:59 2020
- Starting ORACLE instance (normal)
- ************************ Large Pages Information *******************
- Per process system memlock (soft) limit = UNLIMITED
- Total Shared Global Region in Large Pages = 0 KB (0%)
- Large Pages used by this instance: 0 (0 KB)
- Large Pages unused system wide = 0 (0 KB)
- Large Pages configured system wide = 0 (0 KB)
- Large Page size = 2048 KB
- RECOMMENDATION:
- Total System Global Area size is 450 GB. For optimal performance,
- prior to the next instance restart:
- 1. Increase the number of unused large pages by
- at least 230401 (page size 2048 KB, total size 450 GB) system wide to
- get 100% of the System Global Area allocated with large pages
- ********************************************************************
- LICENSE_MAX_SESSION = 0
- LICENSE_SESSIONS_WARNING = 0
- Initial number of CPU is 96
- Number of processor cores in the system is 48
- Number of processor sockets in the system is 4
- Private Interface 'eno2:1' configured from GPnP for use as a private interconnect.
- [name='eno2:1', type=1, ip=xx.xx.xx.155, mac=xxxxxxxx, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
- Public Interface 'eno1' configured from GPnP for use as a public interface.
- [name='eno1', type=1, ip=xx.xx.xx.122, mac=70-57-bf-39-1c-25, net=xx.xx.xx.0/24, mask=255.255.255.0, use=public/1]
- Public Interface 'eno1:1' configured from GPnP for use as a public interface.
- [name='eno1:1', type=1, ip=xx.xx.xx.124, mac=70-57-bf-39-1c-25, net=xx.xx.xx.0/24, mask=255.255.255.0, use=public/1]
- CELL communication is configured to use 0 interface(s):
- CELL IP affinity details:
- NUMA status: NUMA system w/ 4 process groups
- cellaffinity.ora status: cannot find affinity map at '/etc/oracle/cell/network-config/cellaffinity.ora' (see trace file for details)
- CELL communication will use 1 IP group(s):
- Grp 0:
- Picked latch-free SCN scheme 3
- Mon Jul 13 11:06:10 2020
- WARNING: db_recovery_file_dest is same as db_create_file_dest
- Autotune of undo retention is turned on.
- LICENSE_MAX_USERS = 0
- SYS auditing is disabled
- NUMA system with 4 nodes detected
- Starting up:
- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
- With the Partitioning, Real Application Clusters, OLAP, Data Mining
- and Real Application Testing options.
- ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
- System name: Linux
- Node name: rac2
- Release: 3.10.0-327.el7.x86_64
- Version: #1 SMP Thu Oct 29 17:29:29 EDT 2015
- Machine: x86_64
- Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/db_1/dbs/initrac1122.ora
- System parameters with non-default values:
- processes = 8192
- sessions = 12384
- spfile = "+DATA/rac112/spfilerac112.ora"
- nls_language = "AMERICAN"
- nls_territory = "CHINA"
- sga_target = 450G
- control_files = "+DATA/rac112/controlfile/current.261.1044461323"
- control_files = "+DATA/rac112/controlfile/current.260.1044461323"
- db_block_size = 8192
- compatible = "11.2.0.4.0"
- log_archive_dest_1 = "location=+DATA/RAC112/DBFRA"
- cluster_database = TRUE
- db_create_file_dest = "+DATA"
- db_recovery_file_dest = "+DATA"
- db_recovery_file_dest_size= 440700M
- thread = 2
- undo_tablespace = "UNDOTBS2"
- instance_number = 2
- remote_login_passwordfile= "EXCLUSIVE"
- db_domain = ""
- dispatchers = "(PROTOCOL=TCP) (SERVICE=rac112XDB)"
- remote_listener = "rac-scan:1521"
- audit_file_dest = "/u01/app/oracle/admin/rac112/adump"
- audit_trail = "DB"
- db_name = "rac112"
- open_cursors = 300
- pga_aggregate_target = 115200M
- diagnostic_dest = "/u01/app/oracle"
- Cluster communication is configured to use the following interface(s) for this instance
- xx.xx.xx.155
- cluster interconnect IPC version:Oracle UDP/IP (generic)
- IPC Vendor 1 proto 2
- Mon Jul 13 11:06:12 2020
- PMON started with pid=2, OS id=295770
- Error occured while spawning process PMON; error = 27153
- USER (ospid: 295705): terminating the instance due to error 27153
- Instance terminated by USER, pid = 295705
点击(此处)折叠或打开
- [oracle@rac2 trace]$ oerr ora 27157
- 27157, 0000, "OS post/wait facility removed"
- // *Cause: the post/wait facility for which the calling process is awaiting
- // action is removed from the system
- // *Action: check errno and contact Oracle Support
- [oracle@rac2 trace]$ oerr ora 27300
- 27300, 00000, "OS system dependent operation:%s failed with status: %s"
- // *Cause: OS system call error
- // *Action: contact Oracle Support
修改前:
点击(此处)折叠或打开
- grid soft nproc 4096
- grid hard nproc 3088654
- grid soft nofile 1024
- grid hard nofile 65536
- oracle soft nproc 4096
- oracle hard nproc 3088654
- oracle soft nofile 1024
- oracle hard nofile 65536
点击(此处)折叠或打开
- grid soft nproc 9000
- grid hard nproc 3088654
- grid soft nofile 10240
- grid hard nofile 655360
- oracle soft nproc 9000
- oracle hard nproc 3088654
- oracle soft nofile 10240
- oracle hard nofile 655360
因为Oracle设置的process是8192:
点击(此处)折叠或打开
- SQL> show parameter processes;
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- aq_tm_processes integer 1
- db_writer_processes integer 12
- gcs_server_processes integer 5
- global_txn_processes integer 1
- job_queue_processes integer 1000
- log_archive_max_processes integer 4
- processes integer 8192