案例:Standby RAC 遭遇 ORA-1157,1111,1110 导致实例 crash 处理
环境:RHEL 6.5 + Oracle RAC 11.2.0.4 + Dataguard
今天在实验环境的 Pirmary RAC 主库上做了一个增加表空间的操作,结果 Standby RAC 启动同步后直接 crash,具体报错如下:
- Fri Aug 11 19:14:20 2017
- Completed: ALTER DATABASE MOUNT /* db agent *//* {1:53752:2} */
- ALTER DATABASE OPEN /* db agent *//* {1:53752:2} */
- AUDIT_TRAIL initialization parameter is changed to OS, as DB is NOT compatible for database opened with read-only access
- This instance was first to open
- Fri Aug 11 19:14:23 2017
- Using STANDBY_ARCHIVE_DEST parameter default value as USE_DB_RECOVERY_FILE_DEST
- Beginning Standby Crash Recovery.
- Serial Media Recovery started
- Fri Aug 11 19:14:24 2017
- Managed Standby Recovery starting Real Time Apply
- Fri Aug 11 19:14:25 2017
- Warning: VKTM detected a time drift.
- Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
- Fri Aug 11 19:14:26 2017
- Primary database is in MAXIMUM PERFORMANCE mode
- Fri Aug 11 19:14:26 2017
- Primary database is in MAXIMUM PERFORMANCE mode
- RFS[1]: Assigned to RFS process 4031
- RFS[1]: Selected log 11 for thread 1 sequence 171 dbid -1785877518 branch 919999037
- Standby Crash Recovery aborted due to error 1111.
- Errors in file /opt/app/oracle/diag/rdbms/mynas/jyzhao1/trace/jyzhao1_ora_3923.trc:
- ORA-01111: name for data file 10 is unknown - rename to correct file
- ORA-01110: data file 10: '/opt/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00010'
- ORA-01157: cannot identify/lock data file 10 - see DBWR trace file
- ORA-01111: name for data file 10 is unknown - rename to correct file
- ORA-01110: data file 10: '/opt/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00010'
- Fri Aug 11 19:14:27 2017
- RFS[2]: Assigned to RFS process 4029
- RFS[2]: Selected log 13 for thread 1 sequence 170 dbid -1785877518 branch 919999037
- Fri Aug 11 19:14:28 2017
- RFS[3]: Assigned to RFS process 4043
- RFS[3]: Selected log 22 for thread 2 sequence 134 dbid -1785877518 branch 919999037
- RFS[4]: Assigned to RFS process 4041
- RFS[4]: Selected log 23 for thread 2 sequence 135 dbid -1785877518 branch 919999037
- Completed Standby Crash Recovery.
- Fri Aug 11 19:14:31 2017
- Abort recovery for domain 0
- Errors in file /opt/app/oracle/diag/rdbms/mynas/jyzhao1/trace/jyzhao1_ora_3923.trc:
- ORA-10458: standby database requires recovery
- ORA-01157: cannot identify/lock data file 10 - see DBWR trace file
- ORA-01111: name for data file 10 is unknown - rename to correct file
- ORA-01110: data file 10: '/opt/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00010'
- ORA-10458 signalled during: ALTER DATABASE OPEN /* db agent *//* {1:53752:2} */...
- Fri Aug 11 19:14:33 2017
- Shutting down instance (abort)
- License high water mark = 9
- USER (ospid: 4089): terminating the instance
- Instance terminated by USER, pid = 4089
- Fri Aug 11 19:14:34 2017
- Instance shutdown complete
根据报错查到 MOS 相关文档:
How to resolve ORA-01111 ORA-01110 ORA-01157 in a physical standby database (文档 ID 1416554.1)
这里应该就可以猜测出原因基本就是由于主库建立一个文件,备库同步日志,没有创建正确。
MOS 解决方案核心内容:
- SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
- SQL>ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=MANUAL SID='*';
- Rename the unknown datafile 97.
- SQL> ALTER DATABASE RENAME FILE '/u01/app/oracle/product/10.2.0.5/db_1/dbs/UNNAMED00097' to '</absolute path/real datafile name>';
- Create a empty datafile which same structure as the datafile 97 and it would need all archivelogs from time of creation for recovery
- Alter database create datafile '/u01/app/oracle/product/10.2.0.5/db_1/dbs/UNNAMED00097' as '+DATA_ACS4/' size <Actual size of datafile on Primary>
- Please note while giving the above command on ASM you also need to specify the action size of the datafile from the primary (Query v$datafiles for Bytes column)
- SQL>ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO SID='*';
- SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
- SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
看起来和 STANDBY_FILE_MANAGEMENT 有关系,进一步查看 show parameter STANDBY_FILE_MANAGEMENT
- On Primary:
- SYS@jyzhao1 > show parameter STANDBY_FILE_MANAGEMENT
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- standby_file_management string MANUAL
- SYS@jyzhao1 >
- On Standby:
- SQL> show parameter STANDBY_FILE_MANAGEMENT
- NAME TYPE VALUE
- ------------------------------------ ----------- ------------------------------
- standby_file_management string MANUAL
- SQL>
发现果然都是 MANUAL。
先处理当前问题,按照 MOS 的几种方法,选择适合我这里环境的方案:
创建和问题文件相同结构的空文件,然后备库开启应用所有归档文件即可完成恢复。
创建与错误数据文件结构相同的空文件,具体大小从主库查询,不加大小可能遭遇其他问题,详见上面 MOS 文档:
Alter database create datafile '/opt/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00010' as '+DATA/' size 104857600;
创建后,可以发现 v$datafile 已经正确:
- SQL> select name from v$datafile;
- NAME
- --------------------------------------------------------------------------------
- +DATA/mynas/datafile/system.258.951608183
- +DATA/mynas/datafile/sysaux.257.951608183
- +DATA/mynas/datafile/undotbs1.259.951608185
- +DATA/mynas/datafile/users.265.951608205
- +DATA/mynas/datafile/undotbs2.261.951608185
- +DATA/mynas/datafile/dbs_d_jingyu.262.951608185
- +DATA/mynas/datafile/dbs_i_jingyu.263.951608185
- +DATA/mynas/datafile/test.264.951608185
- +DATA/mynas/datafile/test2.260.951608185
- /opt/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00010
- 10 rows selected.
- SQL> Alter database create datafile '/opt/app/oracle/product/11.2.0/dbhome_1/dbs/UNNAMED00010' as '+DATA/' size 104857600;
- Database altered.
- SQL> select name from v$datafile;
- NAME
- --------------------------------------------------------------------------------
- +DATA/mynas/datafile/system.258.951608183
- +DATA/mynas/datafile/sysaux.257.951608183
- +DATA/mynas/datafile/undotbs1.259.951608185
- +DATA/mynas/datafile/users.265.951608205
- +DATA/mynas/datafile/undotbs2.261.951608185
- +DATA/mynas/datafile/dbs_d_jingyu.262.951608185
- +DATA/mynas/datafile/dbs_i_jingyu.263.951608185
- +DATA/mynas/datafile/test.264.951608185
- +DATA/mynas/datafile/test2.260.951608185
- +DATA/mynas/datafile/dbs_d_hank.273.951774293
- 10 rows selected.
然后修改 STANDBY_FILE_MANAGEMENT=AUTO, 正常开启数据库,开启实时应用:
- SQL > ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT = AUTO SID = '*';
- SQL > ALTER DATABASE OPEN;
- SQL > ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
再测试增加数据文件,删除数据文件,都可以正常同步:
- SYS@jyzhao1 > create tablespace dbs_only_test;
- SYS@jyzhao1 > drop tablespace dbs_only_test including contents and datafiles;
备库都会自动的正常添加删除,日志都会记录如下:
- Fri Aug 11 21:49:11 2017
- Media Recovery Waiting for thread 1 sequence 175 (in transit)
- Recovery of Online Redo Log: Thread 1 Group 11 Seq 175 Reading mem 0
- Mem# 0: +FRA/mynas/standbylog/standby_group_11.log
- Fri Aug 11 21:51:15 2017
- Successfully added datafile 11 to media recovery
- Datafile #11: '+DATA/mynas/datafile/dbs_only_test.275.951774665'
- Fri Aug 11 21:53:20 2017
- Recovery deleting file #11:'+DATA/mynas/datafile/dbs_only_test.275.951774665' from controlfile.
- Deleted Oracle managed file +DATA/mynas/datafile/dbs_only_test.275.951774665
- Recovery dropped tablespace 'DBS_ONLY_TEST'
总结:DG 环境,一定要注意参数 STANDBY_FILE_MANAGEMENT=AUTO,否则可能遇到类似情况。
来源: http://www.cnblogs.com/jyzhao/p/7348235.html