2016年4月27日 星期三

[ORACLE][RMAN] 普通資料庫(非系統用的資料庫)災難復原

資料庫環境確認:
Instance啟動時:
1. 若一般資料庫datafile遺失,datafile headers毀損,startup只能啟動到mount

Instance執行中時:
1. 若一般資料庫datafile headers毀損時發生checkpoint,則該毀損的datafile會被標示為offline,但Instance不會有問題,而且重啟Instance時也不會有問題,但有問題的datafile依然是offline。
2. 若一般資料庫壞的不是datafile headers而是其他地方或是整個data毀損或被刪除,該datafile並不會被標示為offline,但重啟Instance時只能啟動到mount,不過還是可以透過alter database datafile <壞掉的檔案> offline;,再將Instance Open。

P.S.:此篇復原前提為參數檔(init),控制檔(controlfile),archivelog,redo log,關鍵資料庫(SYSTEM及UNDO)皆正常的情形下。

復原流程及完整指令:(以下範例為假設datafile 6毀損)
> 非datafile headers毀損,且資料庫為關閉的狀態
1. 將資料庫啟動到Mount
    startup mount;
2. 將有問題的datafile offline
    alter database datafile 6 offline;
3. 開啟資料庫
    alter database open;
4. 使用RMAN執行restore或是switch命令還原有問題的datafile
    restore datafile 6;
5. 使用RMAN執行recover,透過archivelog及redolog復原datafile
    recover datafile 6;
6. 將有問題的datafile online
    alter database datafile 6 online;

指令
1. 在RMAN
RUN{
startup mount;
sql 'alter database datafile 6 offline';
alter database open;
restore datafile 6;
recover datafile 6;
sql 'alter database datafile 6 online';
}

> 非datafile headers毀損,且資料庫為開啟的狀態
1. 將有問題的datafile offline
    alter database datafile 6 offline;
2. 使用RMAN執行restore或是switch命令還原有問題的datafile
    restore datafile 6;
3. 使用RMAN執行recover database,透過archivelog及redolog復原datafile
    recover datafile 6;
4. 將有問題的datafile online
    alter database datafile 6 online;


指令:
1. 在RMAN
RUN{
sql 'alter database datafile 6 offline';
restore datafile 6;
recover datafile 6;
sql 'alter database datafile 6 online';
}


>datafile headers毀損,且資料庫為開啟的狀態
1. 透過系統指令將毀損的datafile從作業系統刪除
    rm /oradata/test.dbf
2. 將有問題的datafile offline
    alter database datafile 6 offline;
3. 使用RMAN執行restore或是switch命令還原有問題的datafile
    restore datafile 6;
4. 使用RMAN執行recover database,透過archivelog及redolog復原datafile
    recover datafile 6;
5. 將有問題的datafile online
    alter database datafile 6 online;


指令:
1. 在作業系統
    rm /oradata/test.dbf
2. 在RMAN
RUN{
sql 'alter database datafile 6 offline';
restore datafile 6;
recover datafile 6;
sql 'alter database datafile 6 online';
}

>datafile headers毀損,且資料庫為關閉的狀態
1. 透過系統指令將毀損的datafile從作業系統刪除
    rm /oradata/test.dbf
2. 將資料庫啟動到Mount
    startup mount;
3. 將有問題的datafile offline
    alter database datafile 6 offline;
4. 開啟資料庫
    alter database open;
5. 使用RMAN執行restore或是switch命令還原有問題的datafile
    restore datafile 6;
6. 使用RMAN執行recover database,透過archivelog及redolog復原datafile
    recover datafile 6;
7. 將有問題的datafile online
    alter database datafile 6 online;

指令:
1. 在作業系統
    rm /oradata/test.dbf
2. 在RMAN
RUN{
startup mount;
sql 'alter database datafile 6 offline';
alter database open;
restore datafile 6;
recover datafile 6;
sql 'alter database datafile 6 online';
}

2016年4月25日 星期一

[ORACLE][RMAN]關鍵資料庫(SYSTEM及UNDO)災難復原

資料庫環境確認
1. Instance已經不正常結束
    sqlplus / as sysdba
    startup mount;

2. Instance尚在Open的狀況下
    2.1 還可以進入資料庫的情況下
          2.1.1 登入資料庫不會出現錯誤訊息
                   sqlplus / as sysdba
                   shutdown abort;
          2.1.2 登入資料庫但卻連到idle instance
                   直接KILL process,例如:kill SMON

3. 已無法進入資料庫
    直接KILL process,例如:kill SMON

P.S.:此篇復原前提為參數檔(init),控制檔(controlfile),archivelog,redo log皆正常的情形下。

復原流程:
1. 使用shutdown abort或是KILL process關閉資料庫
2. 執行startup mount;將資料庫啟動到Mount
3. 使用RMAN執行restore或是switch命令還原有問題的SYSTEM file或UNDO file
4. 使用RMAN執行recover database,透過archivelog及redolog復原資料庫
5. 執行alter database open開啟資料庫

復原詳細過程
1. 關閉資料庫
SQL> shutdown abort;
ORACLE instance shut down.

2. 啟動資料庫,發現有datafile毀損
SQL> startup;
ORACLE instance started.
Total System Global Area 3340451840 bytes
Fixed Size      2257840 bytes
Variable Size   1879051344 bytes
Database Buffers  1442840576 bytes
Redo Buffers     16302080 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/u01/app/oracle/oradata/orclnofra/system01.dbf'

3. 檢查TRACE LOG
log路徑:/u01/app/oracle/diag/rdbms/orclnofra/orclnofra/trace/alert_orclnofra.log
log擷取部分錯誤如下

4. 透過restore database還原data file 1
RMAN> restore datafile 1;  
Starting restore at 25-APR-2016 17:28:00
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=129 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /u01/app/oracle/oradata/orclnofra/system01.dbf
channel ORA_DISK_1: reading from backup piece /backup/ORCLNOFR_20160425_6_1.bak
channel ORA_DISK_1: piece handle=/backup/ORCLNOFR_20160425_6_1.bak tag=TAG20160425T171813
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:15
Finished restore at 25-APR-2016 17:28:15

5. 執行recover database
RMAN> recover database;
Starting recover at 25-APR-2016 17:28:55
using channel ORA_DISK_1
starting media recovery
media recovery complete, elapsed time: 00:00:01
Finished recover at 25-APR-2016 17:28:56

6. 將資料庫OPEN
RMAN> alter database open;
database opened

7. 查看datafile是否OK
SQL> select name from v$datafile;
NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/orclnofra/system01.dbf
/u01/app/oracle/oradata/orclnofra/sysaux01.dbf
/u01/app/oracle/oradata/orclnofra/undotbs01.dbf
/u01/app/oracle/oradata/orclnofra/users01.dbf
/backup/test01.dbf
/backup/test02.dbf
6 rows selected.


2016年4月13日 星期三

[ORACLE][RMAN] ControlFile毀損情境三之利用沒問題Control file置換有問題的Control file

情境:
Control File至少還有一個是好的,其他都有問題

環境準備:
1. 確認Control file位置
SQL> select value from v$parameter where name like 'control_files%';

VALUE
--------------------------------------------------------------------------------
control_files
/u01/app/oracle/oradata/orclnofra/control01.ctl, /u01/app/oracle/oradata/orclnof
ra/control02.ctl

2. 刪除一個Control file
[oracle@nofra orclnofra]$ rm control01.ctl

完整執行過程:
1. 啟動資料庫
SQL> startup;
ORACLE instance started.

Total System Global Area 3340451840 bytes
Fixed Size      2257840 bytes
Variable Size   1879051344 bytes
Database Buffers  1442840576 bytes
Redo Buffers     16302080 bytes
ORA-00205: error in identifying control file, check alert log for more info

2. 查看ALERT LOG確認哪個Control File有問題
LOG路徑:/u01/app/oracle/diag/rdbms/orclnofra/orclnofra/trace
部分錯誤訊息如下
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL
=TCP))'...
starting up 1 shared server(s) ...
ORACLE_BASE from environment = /u01/app/oracle
Wed Apr 13 10:21:50 2016
ALTER DATABASE   MOUNT
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/u01/app/oracle/oradata/orclnofra/control01.ctl'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-205 signalled during: ALTER DATABASE   MOUNT...

3. 把有問題的Control file用好的置換掉
[oracle@nofra orclnofra]$ cp control02.ctl control01.ctl

4. 將資料庫開在mount
SQL> alter database mount;

Database altered.

5.  將資料庫開在OPEN
SQL> alter database open;

Database altered.

2016年4月12日 星期二

[ORACLE][RMAN] ControlFile毀損情境二之還原方式(手動修復不一致)

情境:
1. 所有online Control file毀損
2. ControlFile Autobackup OFF
3. 備份完Control file後,DBA手動刪除了一個tablespace包含Data File,所以上次的Control file備份就沒有剛刪除的Data File

環境準備:
1. 新增一個Tablespace包含一個Datafile
SQL> create tablespace "myTest01" nologging
  2  datafile '/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/myTest01_DT01.dbf'
  3  size 10m;
Tablespace created.

2. Switch LOG
SQL> ALTER SYSTEM ARCHIVE LOG CURRENT;
System altered.

3. 手動備份controlfile
RMAN> backup current controlfile;
Starting backup at 12-APR-2016 14:18:03
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=197 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: starting piece 1 at 12-APR-2016 14:18:05
channel ORA_DISK_1: finished piece 1 at 12-APR-2016 14:18:06
piece handle=/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/01r2rr8s_1_1 tag=TAG20160412T141803 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 12-APR-2016 14:18:06

4. 刪除剛剛建立的Tablespace
SQL> drop tablespace "myTest01" including contents and datafiles;
Tablespace dropped.

5. 手動刪除controlfile,模擬controlfile毀損
[oracle@nofra orclnofra]$ rm control01.ctl
[oracle@nofra orclnofra]$ rm control02.ctl


6. 重新啟動DB會出現如下錯誤
SQL> startup;
ORACLE instance started.
Total System Global Area 3340451840 bytes
Fixed Size      2257840 bytes
Variable Size   1879051344 bytes
Database Buffers  1442840576 bytes
Redo Buffers     16302080 bytes
ORA-00205: error in identifying control file, check alert log for more info

還原語法如下:
1. 在RMAN
restore controlfile from '/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/01r2rr8s_1_1';
2. 在RMAN
mount database;

3. 復原資料庫,解法有兩種
解法一
3-1-1. 在SQLPLUS(在此範例中是編號5的datafile有問題)
alter database datafile 5 offline;
3-1-2. 在SQLPLUS
recover database using backup controlfile;
輸入:AUTO
3-1-2 在SQLPLUS
recover database using backup controlfile;
輸入redo log檔案路徑
3-1-3. 重複執行recover database using backup controlfile,並輸入redo log檔案路徑直到出現Media recovery complete

解法二
3-2-1 在RMAN
recover database skip tablespace "myTest01";

4. 在RMAN
alter database open resetlogs;


完整執行過程:
1. 還原Control File
RMAN> restore controlfile from '/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/01r2rr8s_1_1';
Starting restore at 12-APR-2016 14:21:56
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=129 device type=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=/u01/app/oracle/oradata/orclnofra/control01.ctl
output file name=/u01/app/oracle/oradata/orclnofra/control02.ctl
Finished restore at 12-APR-2016 14:21:57

2. 將DB啟動在mount模式
RMAN> mount database;
database mounted
released channel: ORA_DISK_1

3. 恢復資料庫,會出現錯誤原因是在Control file中紀錄有剛剛建立的Tablespace但是實際上已經是不需要且被刪除了,但recover還是會一直要復原該檔案。
RMAN> recover database;
Starting recover at 12-APR-2016 14:22:48
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=129 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 04/12/2016 14:22:48
RMAN-06094: datafile 5 must be restored

解法一
4-1-1. 將步驟三錯誤訊息中的datafile離線
SQL> alter database datafile 5 offline;
Database altered.

4-1-2. 透過using backup controlfile來復原資料庫,在出現[Specify log]提示時輸入[AUTO],在此例中又出現ORA-00308及ORA-27037的錯誤,但實際上並沒有損壞或是遭刪除,那就表示該資料尚未產生archive log,實際資料還在redo log中
SQL> recover database using backup controlfile;
ORA-00279: change 978477 generated at 04/12/2016 14:17:26 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/arch1_8_908881411.dbf
ORA-00280: change 978477 for thread 1 is in sequence #8

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
AUTO
ORA-00308: cannot open archived log
'/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/arch1_8_908881411.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

ORA-00308: cannot open archived log
'/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/arch1_8_908881411.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

4-1-3. 再次透過using backup controlfile來復原資料庫,在出現[Specify log]提示時輸入redo01.log的位置,結果發現並不存在於redo01.log
SQL> recover database using backup controlfile;
ORA-00279: change 978477 generated at 04/12/2016 14:17:26 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/arch1_8_908881411.dbf
ORA-00280: change 978477 for thread 1 is in sequence #8

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/app/oracle/oradata/orclnofra/redo01.log
ORA-00328: archived log ends at change 978453, need later change 978477
ORA-00334: archived log: '/u01/app/oracle/oradata/orclnofra/redo01.log'

4-1-4. 再次透過using backup controlfile來復原資料庫,在出現[Specify log]提示時輸入redo02.log的位置,若成功便會出現[Media recovery complete.]的訊息
SQL> recover database using backup controlfile;
ORA-00279: change 978477 generated at 04/12/2016 14:17:26 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/arch1_8_908881411.dbf
ORA-00280: change 978477 for thread 1 is in sequence #8

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/app/oracle/oradata/orclnofra/redo02.log
Log applied.
Media recovery complete.

解法二
4-2-1. 跳過有問題的Tablespace直接復原
RMAN> recover database skip tablespace "myTest01";
Starting recover at 11-APR-2016 16:45:10
using channel ORA_DISK_1
Executing: alter database datafile 6 offline
starting media recovery
archived log for thread 1 with sequence 1 is already on disk as file /u01/app/oracle/oradata/orclnofra/redo01.log
archived log file name=/u01/app/oracle/oradata/orclnofra/redo01.log thread=1 sequence=1
media recovery complete, elapsed time: 00:00:00
Finished recover at 11-APR-2016 16:45:10

5. 將資料庫開啟
RMAN> alter database open resetlogs;
database opened

2016年4月11日 星期一

[ORACLE][RMAN] ControlFile毀損情境一之還原方式(自動修復不一致)

情境:
1. 所有online Control file毀損
2. ControlFile Autobackup OFF
3. 備份完Control file後,DBA手動加了一個tablespace包含Data File,所以上次的Control file備份就沒有剛新增的Data File

環境準備:
1. 手動備份controlfile
RMAN> backup current controlfile;
Starting backup at 11-APR-2016 14:15:54
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=9 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: starting piece 1 at 11-APR-2016 14:15:56
channel ORA_DISK_1: finished piece 1 at 11-APR-2016 14:15:57
piece handle=/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/01r2p6or_1_1 tag=TAG20160411T141555 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 11-APR-2016 14:15:57

2. 新增一個tablespace包含一個Data File
SQL> create tablespace "myTest02" NOLOGGING
  2  datafile '/u01/app/oracle/oradata/orclnofra/myTest_Data02.dbf' SIZE 200M;
Tablespace created.

3. 手動刪除controlfile
[oracle@nofra orclnofra]$ rm control01.ctl
[oracle@nofra orclnofra]$ rm control02.ctl

4. 重新啟動DB會出現如下錯誤
RMAN> startup;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of startup command at 04/11/2016 14:48:49
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/u01/app/oracle/oradata/orclnofra/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

還原語法如下:
1. 在RMAN
restore controlfile from '/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/01r2p6or_1_1';
2. 在RMAN
recover database;
3. 在RMAN
alter database open resetlogs;

完整執行過程:
1.還原Control File
[oracle@nofra ~]$ rman target /
Recovery Manager: Release 11.2.0.4.0 - Production on Mon Apr 11 14:53:36 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database (not started)
RMAN> startup;
Oracle instance started
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of startup command at 04/11/2016 14:53:44
ORA-00205: error in identifying control file, check alert log for more info
RMAN> restore controlfile from '/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/01r2p6or_1_1';
Starting restore at 11-APR-2016 14:55:12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=63 device type=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=/u01/app/oracle/oradata/orclnofra/control01.ctl
output file name=/u01/app/oracle/oradata/orclnofra/control02.ctl
Finished restore at 11-APR-2016 14:55:13

RMAN> recover database;
Starting recover at 11-APR-2016 14:57:46
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=129 device type=DISK
starting media recovery
archived log for thread 1 with sequence 6 is already on disk as file /u01/app/oracle/oradata/orclnofra/redo03.log
archived log file name=/u01/app/oracle/oradata/orclnofra/redo03.log thread=1 sequence=6
creating datafile file number=5 name=/u01/app/oracle/oradata/orclnofra/myTest_Data01.dbf
archived log file name=/u01/app/oracle/oradata/orclnofra/redo03.log thread=1 sequence=6
creating datafile file number=6 name=/u01/app/oracle/oradata/orclnofra/myTest_Data02.dbf
archived log file name=/u01/app/oracle/oradata/orclnofra/redo03.log thread=1 sequence=6
media recovery complete, elapsed time: 00:00:01
Finished recover at 11-APR-2016 14:57:50

RMAN> alter database open resetlogs;
database opened