SDS 恢复篇

wangyl1977 发表于 2007-06-03 22:48

基本参考
在DiskSuite下替换失败的系统盘

如果disksute 版本是4.2.1及以上，并且/etc/system中加入一个参数：
echo "set md:mirrored_root_flag=1" >;>; /etc/system则很简单，坏掉一个磁盘，系统可从另一个磁盘启动。
如果不是，则稍微复杂一些。
以我上面的例子为例，c0t0d0和c0t1d0组成一个RAID1 , diskc0t0d0 坏，disk1 c0t1d0是好的话，如何恢复呢？
首先看这2个磁盘设备的别名。OK aliaseOk boot disk1 (我这里是disk1)由于正常的可用的配置数据库个数（state database replicas）要占它的总数的50%以上,我这里坏了2个，所以系统无法启动，
Insufficient metadevice database replicas located.
Use metadb to delete databases which are broken.
Ignore any "Read-only file system" error messages.
Reboot the system when finished to reload the metadevice database.
After reboot, repair any broken database replicas which were deleted.
Type control-d to proceed with normal startup,
(or give root password for system maintenance): ******
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode
因为未满足state replicas大于50%的条件，系统无法进入正常的运行状态
# metadb -i
   flags       first blk          block count
M    p          unknown       unknown       /dev/dsk/c0t0d0s7
M    p          unknown       unknown       /dev/dsk/c0t0d0s7
am plu    16                1034             /dev/dsk/c0t1d0s7
a    pl       16                1034             /dev/dsk/c0t1d0s7
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors
----c0t0d0s7中的replicas的状态标志是M，表示DiskSuite无法正常访问
# metadb -d c0t0d0s7-----删除失效的replicas-----
# metadb -i       -------检查------
   flags       first blk    block count
a mplu       16          1034          /dev/dsk/c0t1d0s5
a    pl       16          1034          /dev/dsk/c0t1d0s6
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/lvm/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors
-------重启动系统
# reboot--disk1
2 检查系统状态
进入系统后，要确认坏盘上的metadevices，把需要replace的设备名字记下：
# metastat
d0: Mirror
Submirror 0: d10
   State: Needs maintenance
Submirror 1: d20
   State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 13423200 blocks
d10: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c0t0d0s0 ;
Size: 13423200 blocks
Stripe 0:
   Device          Start BlockDbase State    Hot Spare
   c0t0d0s0                0 No Maintenance
d20: Submirror of d0
State: Okay
Size: 13423200 blocks
Stripe 0:
Device          Start BlockDbase State    Hot Spare
   c0t1d0s0                0 No Okay
d1: Mirror
Submirror 0: d11
   State: Needs maintenance
Submirror 1: d21
   State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2100000 blocks
d11: Submirror of d1
State: Needs maintenance
Invoke: metareplace d1 c0t0d0s1 ;
Size: 2100000 blocks
Stripe 0:
   Device          Start BlockDbase State    Hot Spare
   c0t0d0s1                0 No Maintenance
d21: Submirror of d1
State: Okay
Size: 2100000 blocks
Stripe 0:
Device       Start BlockDbase State    Hot Spare
   c0t1d0s1          0 No Okay
d4: Mirror
Submirror 0: d14
   State: Needs maintenance
Submirror 1: d24
   State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2100000 blocks
d14: Submirror of d4
State: Needs maintenance
Invoke: metareplace d4 c0t0d0s4 ;
Size: 2100000 blocks
Stripe 0:
Device    Start BlockDbase State    Hot Spare
c0t0d0s4       0 No Maintenance
d24: Submirror of d4
State: Okay
Size: 2100000 blocks
Stripe 0:
Device       Start BlockDbase State    Hot Spare
   c0t1d0s4    0 No Okay
你需要记下这些信息：
d10 -- c0t0d0s0
d11 -- c0t0d0s1
d14 -- c0t0d0s4
以上3个设备是需要替换的。
3 替换失败的硬盘并恢复DiskSuite的配置
拔掉坏盘插入新盘，并按照镜象盘（c0t1d0）的分区表把新盘format
# prtvtoc /dev/rdsk/c0t1d0s2 | fmthard -s - /dev/rdsk/c0t0d0s2
但是别忘了这一步，在新盘上安装引导块：
# installboot /usr/platform/sun4u/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0
接着是在新盘上创建2个state database replicas:
# metadb -f -a /dev/dsk/c0t0d0s6
# metadb -i
最后一步，恢复DiskSuite原来的镜象配置：
# metareplace -e d0 c0t0d0s0
d0: device c0t0d0s0 is enabled
# metareplace -e d1 c0t0d0s1
d1: device c0t0d0s1 is enabled
# metareplace -e d4 c0t0d0s4
d4: device c0t0d0s4 is enabled
这一步做完后，你可以用metastat命令观察，新的镜象开始同步。这个过程可以容易地测试，如果你装了DiskSuite，并做了系统盘的镜象，在做好备份的前提下，可以按照上述内容测试。 -----------------------------------------------------------------总结一下步骤：1）在OK状态下从镜像盘启动（需要先用OK aliase查到镜像盘的别名）OK aliaseOK boot disk12) 系统启动时会提示配置数据库个数不够，输入root密码进入维护模式。查看配置数据库状态 #metadb -i状态为M的行表示该配置数据库所在的分区已坏,应该删掉#metadb –d/dev/dsk/c0t0d0s7再次查看配置数据库状态,应确保无M的磁盘 #metadb –i3)此时可从镜像盘启动#reboot disk14) 重起进入系统后,查看有问题的分区#metastat 找出c0t0d0所属的分区 “State: Needs maintenance”并记下,应该是配置时做过镜像的分区都需要恢复(替换).
5)替换失败的硬盘并恢复DiskSuite的配置
拔掉坏盘,插入新盘，并按照镜象盘（c0t1d0）的分区表把新盘format
# prtvtoc /dev/rdsk/c0t1d0s2 | fmthard -s - /dev/rdsk/c0t0d0s2
6) 在新盘上安装引导块：
# installboot /usr/platform/sun4u/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0 7) 在新盘上创建2个state database replicas:
# metadb -f -a /dev/dsk/c0t0d0s6
# metadb -i
8) 最后一步，恢复DiskSuite原来的镜象配置：
# metareplace -e d0 c0t0d0s0
d0: device c0t0d0s0 is enabled
# metareplace -e d1 c0t0d0s1
d1: device c0t0d0s1 is enabled
# metareplace -e d4 c0t0d0s4
d4: device c0t0d0s4 is enabled
这一步做完后，你可以用metastat命令观察，新的镜象开始同步。

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u/2005/showart_313721.html

页: [1]

Chinaunix's Archiver

SDS 恢 复 篇

SDS 恢复篇