Ora-00600 Internal Error Code Arguments 3020
Contents |
1265884.1] Applies to: Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.2 - Release: ora 00600 internal error code arguments 3020 3 10.2 to 11.2 Information in this document applies to any platform.
Ora-10567 Redo Is Inconsistent With Data Block Standby
SymptomsStandby Redo Apply can terminate due to a failure of redo-data consistency checks, a problem called oracle support stuck recovery. Stuck recovery can occur when an underlying operating system or storage system loses a write issued by the Primary or Standby database during normal operation. Because there is an inconsistency between the information stored in the redo and the information stored in a database block being recovered, the database signals an internal error when applying the redo. ORA-00600: internal error code, arguments: [3020], [2885689059], [1], [419819],[26750], [808], [], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 419819)
ORA-10564: tablespace USER1
ORA-01110: data file '/oracle/datafiles/user1.dbf' CauseThe ORA-600 [3020] stuck recovery error could occur on the Standby database for several reasons including: a lost write on the Primary, a lost write on the Standby, missing redo, or logical corruption on the primary resulting in an incomplete redo chain. Note: With DB_LOST_WRITE_PROTECT enabled on the Primary and Standby, the Standby Redo Apply terminates with the ORA-752 error when a Primary lost write is detected. ORA-752: recovery detected a lost write of a data block
This ORA-752 error indicates a lost write occurred on the Primary database. Oracle strongly recommends enabling DB_LOST_WRITE_PROTECT (and DB_BLOCK_CHECKSUM=FULL) for greater detection and protection from lost writes. Studies have shown the impact on the primary database is negligible. SolutionIn the majority of cases, Standby stuck recovery errors indicate a corruption of the Primary database. No errors may have been reported on the Primary. WARNING: Do not repair the Standby by restorin
[], [], [], [], [], [], [] - Oracle 11gr2 RAC - Active data-guard Let me brief the background, we have banking database with 2 node RAC in production and we have configured Active data-guard and it is fully utilized for reports database for every minute and we cannot effort down time for even for a minute. The steps I followed were Inspired by "The Arup Nanda Blog: Resolving Gaps in Data Guard Apply Using Incremental RMAN Backup " : http://arup.blogspot.com/2009/12/resolving-gaps-in-data-guard-apply.htmlA big thanks to Arup (I never interacted with him)I will narrate http://oradbastuff.blogspot.com/2011/04/ora-752-or-ora-600-3020.html the chain of events as they happened: We use the following query to check for the sync:Query-1SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference" FROM (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#))ARCH, (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY http://bbalijepalli.blogspot.com/2011/02/work-around-ora-600-internal-error.html GROUP BY THREAD#)) APPL WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1; Thread Last Sequence Received Last Sequence Applied Difference---------- ---------------------- --------------------- ---------- 1 44878 44878 0 2 15739 15739 0My team mate noticed the difference has risen to 7 & 8 in the respective threads I started investigating and look at the alert log and we had the ora-600 error and also looked at the following query:Query-2SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#,BLOCK#, BLOCKS FROM V$MANAGED_STANDBY;(Sample query output not actual result)PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS--------- ------------ ---------- ---------- ---------- ----------ARCH CONNECTED 0 0 0 0ARCH CONNECTED 0 0 0 0ARCH CONNECTED 0 0 0 0ARCH CONNECTED 0 0 0 0RFS IDLE 0 0 0 0RFS IDLE 1 44883 41973 1RFS IDLE 0 0 0 0RFS IDLE 0 0 0 0RFS IDLE 0 0 0 0RFS IDLE 0 0 0 0RFS IDLE 2 15743 271919 1RFS IDLE 0 0 0 0MRP0 WAIT_FOR_LOG2 15743 0 013 rows selected.Noticed MRP0 processwas missing in the actual resultAlso verified in alert log:Errors in file /u01/app/oracle/diag/rdbms/flexprdb/FLEXDRDB1/trace/FLEXDRDB1_mrp0_24466.trc:ORA-00600: internal error code, arguments: [3020], [3], [346523], [12929435], [], [], [], [], [], [
DBA Cause: ====== 1) It can be a lost write happened (could be NON-ORACLE issue) 2) It could be a bug What is lost write ============== Considering single block Step 1) Block 1 had scn of 1395 STEP 2) Block 1 was updated https://asksundar.wordpress.com/2015/09/16/corruption-3020/ and scn incremented to 20000 in buffer cache. So the change vector in the redo recorded the previous SCN to be 1395 and changed scn to 20000. STEP 3) Block 1 was indicated to be flushed to disk but due an I/O http://aniloracle.blogspot.com/2013/09/work-around-ora-10567ora-10561-ora-600.html issue the block was flushed but not written to disk. So the SCN for the block in disk remains 1395. STEP 4) Again the same block gets updated and the scn gets incremented from 1395 to 50000. So the change vector internal error in the redo recorded the previous SCN to be 1395 and changed scn to 50000. STEP 5) The redolog gets shipped to standby STEP 6) The recovery applies first redo change vector and changes the block scn from 1395 to 20000. STEP 7) The recovery tries to apply the second change vector. It finds the block scn to be 20000 whereas it is expecting it to be 1395 since for this change vector the previous scn recorded is 1395. Recovery stops with ora-00600 internal error code [3020] because of the lost write which happened in step 3. When this issue can occur: ========================= 1) during recovery this will be reported -> It can be normal hot backup and recovery -> It can be RMAN backup and recovery -> it can be reported in standby recovery Solution: ========= If it is backup and recover (including RMAN) -> we will have to cancel the recovery and open the database till that point OR -> if you cannot stop the recovery till that time and if you want to recover further then you will have to allow corruption into your database and perform recovery (as below) SQL> recover database allow 1 corruption; Note: doing recovery by allowing corruption may create many issue, you may get errors as below ORA-600 [4194] ORA-600 [4193] ORA-600 [2662] ORA-600 [4663] ORA-08103 ORA-08102 If you face this issue with standby, then you may have to take backup of your affected datafiles from primary and restore in standby… "Note: this decision has to be taken by you along with oracle support" SQL> recover standby database test;
Recovery Log /u03/fra/BMSPROD_PD/archivelog/2013_09_11/o1_mf_1_60283_931w1o8z_.arc Wed Sep 11 16:27:34 2013 Errors in file /u01/app/oracle/admin/bmsprod/bdump/bmsprod_p011_7930.trc: ORA-00600: internal error code, arguments: [3020], [31], [1424770], [131448194], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 31, block# 1424770) ORA-10564: tablespace ARSYSTEM ORA-01110: data file 31: '/u03/oradata/BMSPROD_PD/datafile/ARSYSTEM_31.dbf' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 212538 Wed Sep 11 16:27:36 2013 Errors in file /u01/app/oracle/admin/bmsprod/bdump/bmsprod_p011_7930.trc: ORA-00600: internal error code, arguments: [3020], [31], [1424770], [131448194], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 31, block# 1424770) ORA-10564: tablespace ARSYSTEM Take the RMAN backup of particular datafile. On Primary - $ rman target / Recovery Manager: Release 10.2.0.3.0 - Production on Thu Sep 12 00:12:02 2013 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: BMSPROD (DBID=3750301684) RMAN> run { allocate channel t1 device type disk; allocate channel t2 device type disk; backup datafile 31 FORMAT '/u04/backup/bmsprod/incr/datafile_31_%U'; } 2> 3> 4> 5> using target database control file instead of recovery catalog allocated channel: t1 channel t1: sid=402 instance=bmsprod1 devtype=DISK allocated channel: t2 channel t2: sid=351 instance=bmsprod1 devtype=DISK Starting backup at 12-SEP-13 channel t1: starting full datafile backupset channel t1: specifying datafile(s) in backupset input datafile fno=00031 name=+DATA/bmsprod_od/datafile/arsystem.724.808399285 channel t1: starting piece 1 at 12-SEP-13 channel t1: finished piece 1 at 12-SEP-13 piece handle=/u04/backup/bmsprod/incr/datafile_31_4iojkdb0_1_1 tag=TAG20130912T001216 comment=NONE channel t1: backup set complete, elapsed time: 00:06:05 Finished backup at 12-SEP-13 released channel: t1 released channel: t2 RMAN> exit Copy the b