« 元大都的枫叶 绚烂斑斓的美丽 | Blog首页 | Oracle11g的新特性:Database和SQL重演(replay) »
磁盘IO故障 导致Redo损坏一例
链接:https://www.eygle.com/archives/2006/11/io_fault_redo_corruption.html
前几天一个数据库的硬盘出现问题,经过格式化之后恢复正常,今天这块硬盘再次出现问题。
这次损坏的是Redo日志,数据库警告日志给出Redo相关的错误信息:
Mon Nov 13 11:42:54 2006
Errors in file /opt/oracle/admin/mydb/udump/mydb_ora_16682.trc:
ORA-00333: redo log read error block 186498 count 6144
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'
ORA-27072: skgfdisp: I/O error
Linux Error: 2: No such file or directory
Additional information: 186497
Mon Nov 13 11:42:58 2006
Errors in file /opt/oracle/admin/mydb/udump/mydb_ora_16682.trc:
ORA-00333: redo log read error block 184450 count 8192
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
Linux Error: 2: No such file or directory
Additional information: 186498
Mon Nov 13 11:43:03 2006
Errors in file /opt/oracle/admin/mydb/udump/mydb_ora_16682.trc:
ORA-00333: redo log read error block 184450 count 8192
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
Linux Error: 2: No such file or directory
Additional information: 186498
相关的跟踪文件记录了类似的错误信息:
[oracle@gdmstest bdump]$ cat /opt/oracle/admin/mydb/udump/mydb_ora_16682.trc
/opt/oracle/admin/mydb/udump/mydb_ora_16682.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning option
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0
System name: Linux
Node name: gdmstest.hurray.com.cn
Release: 2.4.21-15.EL
Version: #1 Thu Apr 22 00:27:41 EDT 2004
Machine: i686
Instance name: mydb
Redo thread mounted by this instance: 1
Oracle process number: 11
Unix process pid: 16682, image: oracle@gdmstest.hurray.com.cn (TNS V1-V3)*** SESSION ID:(9.3) 2006-11-13 11:41:23.555
Thread checkpoint rba:0x00001d.00000002.0010 scn:0x0000.000f94cd
On-disk rba:0x00001d.0002dc60.0000 scn:0x0000.000f9b4e
Use incremental checkpoint cache-low RBA
Thread 1 recovery from rba:0x00001d.00029082.0000 scn:0x0000.00000000
*** 2006-11-13 11:42:54.830
ORA-00333: redo log read error block 186498 count 6144
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'
ORA-27072: skgfdisp: I/O error
Linux Error: 2: No such file or directory
Additional information: 186497
ORA-00333: redo log read error block 184450 count 8192
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
Linux Error: 2: No such file or directory
Additional information: 186498
ORA-00333: redo log read error block 184450 count 8192
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
Linux Error: 2: No such file or directory
Additional information: 186498
ORA-00333: redo log read error block 184450 count 8192
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/mydb/redo02.log'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
Linux Error: 2: No such file or directory
Additional information: 186498
察看系统提示,发现存在问题的扇区(Sector)和上次相同(sector=14266880),看来真的是物理损坏,只能更换硬盘了:
[oracle@gdmstest bdump]$ dmesg
or=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
-The End-
历史上的今天...
>> 2008-11-13文章:
>> 2005-11-13文章:
By eygle on 2006-11-13 14:51 | Comments (7) | Case | 965 |
数据库有没有自动down掉?
会的,挂了:
Linux Error: 4: Interrupted system call
Additional information: 187487
LGWR: terminating instance due to error 340
Instance terminated by LGWR, pid = 15045
看来镜像redo log还是很有用的。
牺牲点性能还是有收益的。
不过有阵列一般出这种问题的概率较小,没有阵列数据库的重要性也就相对差一些。
对!毕竟安全是第一位的!
我做过Redo镜像的数据库不多,但是其中一个后来真是救了我一命。
是呀,有时候还是必要的,特别是比较重要的库