Troubleshooting RAID 1 in SDS
Troubleshooting RAID 1 in Solstice DiskSuite Software
Yazid Mohamed, June 2006
This Tech Tip offers suggestions for resolving both database replica errors and metadevice errors, using RAID 1 in Solstice DiskSuite software.
Database Replica Errors
[*]Problem: State database is corrupted or unavailable
[*]Cause: Disk failure, disk I/O error
[*]Symptom: Error message at boot time if databases are
Suggested steps to follow:
1. At the ok prompt, issue the boot command. The system will enter single-user mode because of the broken database replicas.
ok > boot
...
Hostname: yazid
metainit: yazid: stale databases
Insufficient metadevice database replicas located.
Use metadb to delete databases which are broken.
Ignore any "Read-only file system" error messages.
Reboot the system when finished to reload the metadevice
database.
After reboot, repair any broken database replicas which were
deleted.
Type Ctrl-d to proceed with normal startup,
(or give root password for system maintenance):
Entering System Maintenance Mode.
2. Use the metadb command to look at the metadevice state database. You can see which state database replicas are not available -- they are marked by "unknown" and the M flag.
# metadb -i
flags first blk block count
a mp lu 16 1034 /dev/dsk/c0t0d0s7
a pl 1050 1034 /dev/dsk/c0t0d0s7
M p unknown unknown /dev/dsk/c0t1d0s7
M p unknown unknown
3. Delete the state database replicas on the bad disk using the -d option. At this point, the root (/) file system is read-only. You can ignore the mddb.cf error messages:
# metadb-d-fc0t1d0s7
metadb: demo: /etc/opt/SUNWmd/mddb.cf.new: Read-only file system.
Verify deletion:
# metadb-i
flags first blk block count
a mplu 16 1034 /dev/dsk/c0t0d0s7
a pl 1050 1034 /dev/dsk/c0t0d0s7
4. Reboot the system.
5. Use the metadb command to add back the state database replicas and verify that these replicas are correct.
# metadb -a -c 2 c0t1d0s7
# metadb -i
flags first blkblock count
a mpluo16 1034 dev/dsk/c0t0d0s7
a pluo1050 1034 dev/dsk/c0t0d0s7
a u 16 1034 dev/dsk/c0t1d0s7
a u 1050 1034 dev/dsk/c0t1d0s7
Metadevice Errors
[*]Problem: Sub-mirrors are out of sync in "Needs maintenance" state
[*]Cause: Disk problem or failure, improper shutdown, communication problems between two mirrored disks
[*]Symptom: "Needs maintenance" errors in metastat output
Suggested steps to follow:
1. Replace the faulty disk.
2. Create a partition that is the same as the original disk. If you need to recover the state database, follow the above steps.
3. Log in to the Solaris OS and issue the metastat command. You will see the results as shown below:
# metastat
d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
...
d10: Submirror of d0
State: Needs maintenance
Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 "
Size: 47628 blocks
Stripe 0:
Device Start BlockDbase State Hot Spare
/dev/dsk/c0t3d0s0 0 No Maintenance
d20: Submirror of d0
State: Okay
Size: 47628 blocks
Stripe 0:
Device Start BlockDbase State Hot Spare
/dev/dsk/c0t2d0s0 0 No Okay
4. The result shows that the disk c0t3d0s0 was faulty and replaced. Use the metareplace command to enable the device:
# metareplace -e d0 c0t3d0s0
Device /dev/dsk/c0t3d0s0 is enabled
Or if you want to move the faulty device to a new disk with a different target, you can use this command:
# metareplaced0 c0t3d0s0
Reference
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/26090/showart_322090.html
页:
[1]