关于SDS系统盘镜像
机器启动提示要fsck,只能进到单用户日志如下bash-3.00# metastat
d37: 镜像
次镜像 0: d17
状态: 需要维护
次镜像 1: d27
状态: 需要维护
传送: 1
读入选项: roundrobin (缺省)
写入选项: parallel (缺省)
大小: 28675968 块 (13 GB)
d17: d37 的次镜像
状态: 需要维护
调用: metasync d37
大小: 28675968 块 (13 GB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t0d0s7 0 否 确定 是
d27: d37 的次镜像
状态: 需要维护
调用: metasync d37
大小: 28675968 块 (13 GB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t1d0s7 0 否 确定 是
d34: 镜像
次镜像 0: d14
状态: 需要维护
次镜像 1: d24
状态: 需要维护
传送: 1
读入选项: roundrobin (缺省)
写入选项: parallel (缺省)
大小: 2055552 块 (1003 MB)
d14: d34 的次镜像
状态: 需要维护
调用: metasync d34
大小: 2055552 块 (1003 MB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t0d0s4 0 否 确定 是
d24: d34 的次镜像
状态: 需要维护
调用: metasync d34
大小: 2055552 块 (1003 MB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t1d0s4 0 否 确定 是
d33: 镜像
次镜像 0: d13
状态: 需要维护
次镜像 1: d23
状态: 需要维护
传送: 1
读入选项: roundrobin (缺省)
写入选项: parallel (缺省)
大小: 183168 块 (89 MB)
d13: d33 的次镜像
状态: 需要维护
调用: metasync d33
大小: 183168 块 (89 MB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t0d0s3 30528 是 确定 是
d23: d33 的次镜像
状态: 需要维护
调用: metasync d33
大小: 183168 块 (89 MB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t1d0s3 30528 是 确定 是
d31: 镜像
次镜像 0: d11
状态: 需要维护
次镜像 1: d21
状态: 需要维护
传送: 1
读入选项: roundrobin (缺省)
写入选项: parallel (缺省)
大小: 32776896 块 (15 GB)
d11: d31 的次镜像
状态: 需要维护
调用: metasync d31
大小: 32776896 块 (15 GB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t0d0s1 0 否 确定 是
d21: d31 的次镜像
状态: 需要维护
调用: metasync d31
大小: 32776896 块 (15 GB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t1d0s1 0 否 确定 是
d30: 镜像
次镜像 0: d10
状态: 需要维护
次镜像 1: d20
状态: 需要维护
传送: 1
读入选项: roundrobin (缺省)
写入选项: parallel (缺省)
大小: 77826048 块 (37 GB)
d10: d30 的次镜像
状态: 需要维护
调用: metareplace d30 c1t0d0s0 <新设备>
大小: 77826048 块 (37 GB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t0d0s0 0 否 维护 是
d20: d30 的次镜像
状态: 需要维护
调用: 取代"维护"组件
之后: metareplace d30 c1t1d0s0 <new device>
大小: 77826048 块 (37 GB)
条 0:
设备 引导块 Dbase 状态 Reloc 热备援
c1t1d0s0 0 否 最近出错 是
Device Relocation Information:
Device RelocDevice ID
c1t1d0 是 id1,sd@SSEAGATE_ST373207LSUN72G_45333MPH____________3KT33MPH
c1t0d0 是 id1,sd@SSEAGATE_ST373207LSUN72G_45333MZH____________3KT33MZH
bash-3.00# iostat -En
c1t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATEProduct: ST373207LSUN72GRevision: 045A Serial No: 0545333MZH
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c1t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATEProduct: ST373207LSUN72GRevision: 045A Serial No: 0545333MPH
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 1
Vendor: TOSHIBAProduct: ODD-DVD SD-C2732 Revision: 1055 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c1t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATEProduct: ST373207LSUN72GRevision: 045A Serial No: 0545331Y5P
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c2t40d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SUN Product: StorEdge 3510 Revision: 421F Serial No:
Size: 1465.42GB <1465416417280 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
bash-3.00# dmesg
2011年10月26日 星期三 20时40分17秒 CST
Oct 21 16:06:01 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status msg on node SOC_DB_B change to <LogicalHostname offline.>
Oct 21 16:06:01 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip state on node SOC_DB_B change to R_OFFLINE
Oct 21 16:06:01 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:06:01 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status msg on node SOC_DB_B change to <Stopping>
Oct 21 16:06:02 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db state on node SOC_DB_B change to R_OFFLINE
Oct 21 16:06:02 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status on node SOC_DB_B change to R_FM_OFFLINE
Oct 21 16:06:02 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status msg on node SOC_DB_B change to <>
Oct 21 16:06:02 SOC_DB_A Cluster.RGM.global.rgmd: resource group rg-db state on node SOC_DB_B change to RG_OFFLINE
Oct 21 16:06:02 SOC_DB_A Cluster.RGM.global.rgmd: resource group rg-db state on node SOC_DB_A change to RG_PENDING_ONLINE
Oct 21 16:06:02 SOC_DB_A Cluster.RGM.global.rgmd: launching method <hafoip_prenet_start> for resource <dbip>, resource group <rg-db>, node <SOC_DB_A>, timeout <300> seconds
Oct 21 16:06:02 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status on node SOC_DB_A change to R_FM_UNKNOWN
Oct 21 16:06:02 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status msg on node SOC_DB_A change to <Starting>
Oct 21 16:06:16 SOC_DB_A su: 'su root' failed for oracle on /dev/pts/7
Oct 21 16:06:17 SOC_DB_A Cluster.RGM.global.rgmd: method <hafoip_prenet_start> completed successfully for resource <dbip>, resource group <rg-db>, node <SOC_DB_A>, time used: 5% of timeout <300 seconds>
Oct 21 16:06:17 SOC_DB_A Cluster.RGM.global.rgmd: launching method <hastorageplus_prenet_start> for resource <disk-db>, resource group <rg-db>, node <SOC_DB_A>, timeout <1800> seconds
Oct 21 16:06:17 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status on node SOC_DB_A change to R_FM_UNKNOWN
Oct 21 16:06:17 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status msg on node SOC_DB_A change to <Starting>
Oct 21 16:06:33 SOC_DB_A SC[,SUNW.HAStoragePlus:9,rg-db,disk-db,hastorageplus_prenet_start]: Failed to analyze the device file /dev/md/dbset/dsk/d1111: I/O error.
Oct 21 16:06:33 SOC_DB_A Cluster.RGM.global.rgmd: Method <hastorageplus_prenet_start> failed on resource <disk-db> in resource group <rg-db>
Oct 21 16:06:33 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db state on node SOC_DB_A change to R_START_FAILED
Oct 21 16:06:33 SOC_DB_A Cluster.RGM.global.rgmd: resource group rg-db state on node SOC_DB_A change to RG_PENDING_OFF_START_FAILED
Oct 21 16:06:33 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status on node SOC_DB_A change to R_FM_FAULTED
Oct 21 16:06:33 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status msg on node SOC_DB_A change to <>
Oct 21 16:06:33 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip state on node SOC_DB_A change to R_STOPPING
Oct 21 16:06:33 SOC_DB_A Cluster.RGM.global.rgmd: launching method <hafoip_stop> for resource <dbip>, resource group <rg-db>, node <SOC_DB_A>, timeout <300> seconds
Oct 21 16:06:33 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status msg on node SOC_DB_A change to <Stopping>
Oct 21 16:06:48 SOC_DB_A ip: TCP_IOC_ABORT_CONN: local = 172.016.020.008:0, remote = 000.000.000.000:0, start = -2, end = 6
Oct 21 16:06:48 SOC_DB_A ip: TCP_IOC_ABORT_CONN: aborted 0 connection
Oct 21 16:06:48 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status on node SOC_DB_A change to R_FM_OFFLINE
Oct 21 16:06:48 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status msg on node SOC_DB_A change to <LogicalHostname offline.>
Oct 21 16:06:48 SOC_DB_A Cluster.RGM.global.rgmd: method <hafoip_stop> completed successfully for resource <dbip>, resource group <rg-db>, node <SOC_DB_A>, time used: 5% of timeout <300 seconds>
Oct 21 16:06:48 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip state on node SOC_DB_A change to R_OFFLINE
Oct 21 16:06:48 SOC_DB_A Cluster.RGM.global.rgmd: launching method <hastorageplus_postnet_stop> for resource <disk-db>, resource group <rg-db>, node <SOC_DB_A>, timeout <1800> seconds
Oct 21 16:06:48 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status on node SOC_DB_A change to R_FM_UNKNOWN
Oct 21 16:06:48 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status msg on node SOC_DB_A change to <Stopping>
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: method <hastorageplus_postnet_stop> completed successfully for resource <disk-db>, resource group <rg-db>, node <SOC_DB_A>, time used: 0% of timeout <1800 seconds>
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db state on node SOC_DB_A change to R_OFFLINE
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status on node SOC_DB_A change to R_FM_OFFLINE
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status msg on node SOC_DB_A change to <>
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource group rg-db state on node SOC_DB_A change to RG_OFFLINE_START_FAILED
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource group rg-db state on node SOC_DB_A change to RG_OFFLINE
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource group rg-db state on node SOC_DB_B change to RG_PENDING_ONLINE
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status msg on node SOC_DB_B change to <Starting>
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:07:03 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status msg on node SOC_DB_B change to <Starting>
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status on node SOC_DB_B change to R_FM_ONLINE
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db status msg on node SOC_DB_B change to <>
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip state on node SOC_DB_B change to R_STARTING
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db state on node SOC_DB_B change to R_ONLINE
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status on node SOC_DB_B change to R_FM_ONLINE
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip status msg on node SOC_DB_B change to <LogicalHostname online.>
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata status msg on node SOC_DB_B change to <Starting>
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata state on node SOC_DB_B change to R_STARTING
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle status msg on node SOC_DB_B change to <Starting>
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle state on node SOC_DB_B change to R_STARTING
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource lsnr status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource lsnr status msg on node SOC_DB_B change to <Starting>
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource lsnr state on node SOC_DB_B change to R_STARTING
Oct 21 16:07:05 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip state on node SOC_DB_B change to R_ONLINE
Oct 21 16:07:06 SOC_DB_A Cluster.RGM.global.rgmd: resource lsnr status msg on node SOC_DB_B change to <>
Oct 21 16:07:07 SOC_DB_A su: 'su root' failed for oracle on /dev/pts/7
Oct 21 16:07:07 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata status msg on node SOC_DB_B change to <>
Oct 21 16:07:07 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle status msg on node SOC_DB_B change to <>
Oct 21 16:07:10 SOC_DB_A Cluster.RGM.global.rgmd: resource lsnr status on node SOC_DB_B change to R_FM_ONLINE
Oct 21 16:07:10 SOC_DB_A Cluster.RGM.global.rgmd: resource lsnr state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:07:12 SOC_DB_A Cluster.RGM.global.rgmd: resource lsnr state on node SOC_DB_B change to R_ONLINE
Oct 21 16:07:36 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata status on node SOC_DB_B change to R_FM_ONLINE
Oct 21 16:07:36 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle status on node SOC_DB_B change to R_FM_ONLINE
Oct 21 16:07:38 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:07:38 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:07:40 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle state on node SOC_DB_B change to R_ONLINE
Oct 21 16:07:40 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata state on node SOC_DB_B change to R_ONLINE
Oct 21 16:07:40 SOC_DB_A Cluster.RGM.global.rgmd: resource group rg-db state on node SOC_DB_B change to RG_ONLINE
Oct 21 16:07:40 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:07:40 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:07:40 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata status on node SOC_DB_B change to R_FM_ONLINE
Oct 21 16:07:40 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle status on node SOC_DB_B change to R_FM_ONLINE
Oct 21 16:12:19 SOC_DB_A Cluster.RGM.global.rgmd: resource group rg-db state on node SOC_DB_B change to RG_PENDING_OFFLINE
Oct 21 16:12:19 SOC_DB_A Cluster.RGM.global.rgmd: resource dbip state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:12:19 SOC_DB_A Cluster.RGM.global.rgmd: resource disk-db state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:12:19 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:12:19 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:12:19 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle status msg on node SOC_DB_B change to <Stopping>
Oct 21 16:12:19 SOC_DB_A Cluster.RGM.global.rgmd: resource oracle state on node SOC_DB_B change to R_STOPPING
Oct 21 16:12:20 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata state on node SOC_DB_B change to R_ONLINE_UNMON
Oct 21 16:12:20 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata status on node SOC_DB_B change to R_FM_UNKNOWN
Oct 21 16:12:20 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata status msg on node SOC_DB_B change to <Stopping>
Oct 21 16:12:20 SOC_DB_A Cluster.RGM.global.rgmd: resource socdata state on node SOC_DB_B change to R_STOPPING
请教,应该硬盘是没有硬件错误的。假如硬盘是好的,求解决步骤。 metareplace -e d30 c1t0d0s0
metareplace -e d30 c1t1d0s0
fsck -y
reboot 回复 2# 财版
metareclace -e 报错,无法执行 回复 3# rf00147
报段错误 回复 4# rf00147
给你机器先做个最大化自检,看有没有其他错误。没有的话,单用户先mountc1t0d0s0到临时目录,看里面你的东西还在不在,然后mounc1t1d0s0到临时目录,看里面你的东西还在不在.在的话,再次尝试metareplace -e, 还有报错,就得换方式处理了。 要是 c1t0d0s0 或 c1t1d0s0里面数据都在.
先备份出来.
1.先修复其它d设备的状态,方法如提示所示metasync
2.d30 两个镜像都在维护状态,有可能无法确认数据是否.有以下方法可尝试:
a. metasync 修复
b. metadetach d20,fsck d30,之后看看能否修复d30状态,如果可以,再metattach d20同步数据即可
c. metaclear 清除D30设备及子镜像, 恢复/etc/vfstab文件的/挂接点,单用户下修复fsck c1t0d0s0,重新启动,如果正常引导,重新作/的镜像
以上方法可依次尝试 回复 6# dinky
感谢帮助,还在尝试 回复 7# rf00147
SOC。。。。看的真熟悉了。
页:
[1]