Ora-00600 Internal Error Code 3020
Jackson | 0 Comments The Problem We are running Windows Server 2003, Sp2 ora 00600 internal error code arguments 3020 3 on both the primary and standby DB servers, using Oracle 11.2.0.2. ora-10567 redo is inconsistent with data block standby We are using a data guard physical standby DB, also. We have a large tablespace due to oracle support a migration that we had completed as a one-off to load the data into the new database. I attempted to shrink some of the datafiles to make them more manageable and not use up unnecessary space. After making the changes I checked that the physical standby database had been updated properly and found some errors in the alert log: ORA-00600: internal error code, arguments: [3020], [9], [490111], [38238847], [], [], [], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 9, block# 490111, file offset is 4014989312 bytes) ORA-10564: tablespace CDC_PUBLISHER_X1M ORA-01110: data file 9: ‘D:\ORADATA\CTS\CDC_PUBLISHER_X1M02.DBF' ORA-10561: block type ‘TRANSACTION MANAGED DATA BLOCK', data object# 1367456 Incident details in: D:\diag\rdbms\dg\incident\incdir_66377\db_pr02_4048_i66377.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Slave exiting with ORA-600 exception The Cause Oracle support say that this is a bug which is fixed in 11.2.0.2 patchset 14 (which was not out for Windows at the time of writing). The Solution The first thing to do would be to apply the patch which fixes the issue. Failing that, or perhaps in addition to that if it's required, you can take the following steps to resolve the issue: 1. On the standby database stop apply ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL; 2. Shutdown the standby database SHUTDOWN IMMEDIATE; 3. Shutdown the primary database SHUTDOWN IMMEDIATE; 4. Copy the affected files from the primary to the standby database Co
1265884.1] Applies to: Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.2 - Release: 10.2 to 11.2 Information in this document applies to any platform. SymptomsStandby Redo Apply can terminate due to a failure of redo-data consistency checks, a problem called stuck recovery. Stuck recovery can occur when an underlying operating system or storage system loses a write issued by the Primary or Standby database during normal operation. Because there is an inconsistency between the information stored in the redo and the information stored in a database block being recovered, the database signals an internal error when applying the redo. ORA-00600: http://www.ora00600.com/wordpress/scripts/ora600/ora-00600-internal-error-code-arguments-3020-9/ internal error code, arguments: [3020], [2885689059], [1], [419819],[26750], [808], [], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 419819)
ORA-10564: tablespace USER1
ORA-01110: data file '/oracle/datafiles/user1.dbf' CauseThe ORA-600 [3020] stuck recovery error could occur on the Standby database for several reasons including: a lost write on the Primary, a lost write on the Standby, missing redo, or logical corruption on the primary http://oradbastuff.blogspot.com/2011/04/ora-752-or-ora-600-3020.html resulting in an incomplete redo chain. Note: With DB_LOST_WRITE_PROTECT enabled on the Primary and Standby, the Standby Redo Apply terminates with the ORA-752 error when a Primary lost write is detected. ORA-752: recovery detected a lost write of a data block
This ORA-752 error indicates a lost write occurred on the Primary database. Oracle strongly recommends enabling DB_LOST_WRITE_PROTECT (and DB_BLOCK_CHECKSUM=FULL) for greater detection and protection from lost writes. Studies have shown the impact on the primary database is negligible. SolutionIn the majority of cases, Standby stuck recovery errors indicate a corruption of the Primary database. No errors may have been reported on the Primary. WARNING: Do not repair the Standby by restoring a backup taken on the Primary, as that will ensure that the Standby is also corrupt! The only exception is when the Standby is known to have a lost write, but this determination should be made by Oracle Support. An ORA-752 error definitively identifies a lost write on the Primary. Consider failing over to the Standby immediately if data integrity is critical and some data loss is acceptable. Oracle Support should also be engaged immediately when an ORA-600 [3020] error occurs by opening a Service Request via My Oracle Support. When media recovery encounters a problem, the alert
DBA Cause: ====== 1) It can be a lost write happened (could be NON-ORACLE issue) 2) It could be a bug What is lost write ============== Considering single block Step 1) Block 1 had scn https://asksundar.wordpress.com/2015/09/16/corruption-3020/ of 1395 STEP 2) Block 1 was updated and scn incremented to 20000 in http://oraclegurukul.blogspot.com/2014/02/ora-00600-internal-error-code-arguments.html buffer cache. So the change vector in the redo recorded the previous SCN to be 1395 and changed scn to 20000. STEP 3) Block 1 was indicated to be flushed to disk but due an I/O issue the block was flushed but not written to disk. So the SCN for the block in disk remains 1395. STEP 4) Again the internal error same block gets updated and the scn gets incremented from 1395 to 50000. So the change vector in the redo recorded the previous SCN to be 1395 and changed scn to 50000. STEP 5) The redolog gets shipped to standby STEP 6) The recovery applies first redo change vector and changes the block scn from 1395 to 20000. STEP 7) The recovery tries to apply the second change vector. It finds the block scn to internal error code be 20000 whereas it is expecting it to be 1395 since for this change vector the previous scn recorded is 1395. Recovery stops with ora-00600 [3020] because of the lost write which happened in step 3. When this issue can occur: ========================= 1) during recovery this will be reported -> It can be normal hot backup and recovery -> It can be RMAN backup and recovery -> it can be reported in standby recovery Solution: ========= If it is backup and recover (including RMAN) -> we will have to cancel the recovery and open the database till that point OR -> if you cannot stop the recovery till that time and if you want to recover further then you will have to allow corruption into your database and perform recovery (as below) SQL> recover database allow 1 corruption; Note: doing recovery by allowing corruption may create many issue, you may get errors as below ORA-600 [4194] ORA-600 [4193] ORA-600 [2662] ORA-600 [4663] ORA-08103 ORA-08102 If you face this issue with standby, then you may have to take backup of your affected datafiles from primary and restore in standby… "Note: this decision has to be taken by you along with oracle support" SQL> recover standby database test;
[1], [3778], [1], [5], [2213], [16], [] ----- INCOMPLETE RECOVERY WHEN DELETE "SYSTEM01.DBF" Note:- This error mostly generated when OS and H/W failure some other "ORACLE BUGS" then you have restore the datafile without any error after that recovery time throwing this error Datafile "SYSTEM01.DBF" and "SYSAUX01.DBF" and so on. So first clear its ORACLE BUG else DBF CURROPTION BLOCK BUG. rm -rf system01.dbf (incase of accidentaly delete restore after that recently and consistent backup) copy system01.dbf from backup to detination location where online dbf works SQL> shutdown abort; SQL> startup mount SQL> recover database; ORA-00283: recovery session canceled due to errors ORA-00600: internal error code, arguments: [3020], [1], [3778], [1], [5], [2213], [16], [] ORA-10567: Redo is inconsistent with data block (file# 1, block# 3778) ORA-10564: tablespace SYSTEM ORA-01110: data file 1: '/oradb/app/oracle/oradata/orcl/system01.dbf' ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 483 -------------------------------------------------------------------------------------------------------------------------------- Solution of the Problem -------Solution 01: The fix is to do a manual recovery with allow 1 corruption. That is "recover database allow 1 corruption;" which will skip the bad transaction. We need to repeat this command until the recovery completes. rman target sys RMAN> recover database; RMAN> blockrecover datafile 1 block 3778; # Only Try but not resolved then follow next one command RMAN> recover database allow 1 corruption; RMAN> recover database allow 1 corruption; Starting recover at 11-APR-10 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=320 devtype=DISK starting media recovery media recovery failed RMAN-00571: ====