- 论坛徽章:
- 0
|
2x3320(JBOD)x9块硬盘
SDS做raid0+1,如下:
# metastat -s dgsmp
dgsmp/d100: Mirror
Submirror 0: dgsmp/d101
State: Needs maintenance
Submirror 1: dgsmp/d102
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 573356544 blocks
dgsmp/d101: Submirror of dgsmp/d100
State: Needs maintenance
Invoke: metareplace dgsmp/d100 d23s0 <new device>
Size: 573356544 blocks
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Hot Spare
d20s0 0 No Maintenance
d21s0 0 No Maintenance
d22s0 0 No Maintenance
d23s0 0 No Maintenance
dgsmp/d102: Submirror of dgsmp/d100
State: Okay
Size: 573356544 blocks
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Hot Spare
d14s0 0 No Okay
d15s0 0 No Okay
d16s0 0 No Okay
d17s0 0 No Okay
dgsmp/d120: Mirror
Submirror 0: dgsmp/d121
State: Needs maintenance
Submirror 1: dgsmp/d122
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 716695680 blocks
dgsmp/d121: Submirror of dgsmp/d120
State: Needs maintenance
Invoke: metareplace dgsmp/d120 d19s0 <new device>
Size: 716695680 blocks
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Hot Spare
d6s0 0 No Maintenance
d7s0 0 No Maintenance
d8s0 0 No Maintenance
d18s0 0 No Maintenance
d19s0 0 No Maintenance
dgsmp/d122: Submirror of dgsmp/d120
State: Needs maintenance
Invoke: after replacing "Maintenance" components:
metareplace dgsmp/d120 d13s0 <new device>
Size: 716695680 blocks
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Hot Spare
d9s0 0 No Okay
d10s0 0 No Okay
d11s0 0 No Okay
d12s0 0 No Okay
d13s0 0 No Last Erred
dgsmp/d100 -m dgsmp/d101 dgsmp/d102 1
dgsmp/d101 1 4 d20s0 d21s0 d22s0 d23s0 -i 32b
dgsmp/d102 1 4 d14s0 d15s0 d16s0 d17s0 -i 32b
dgsmp/d120 -m dgsmp/d121 dgsmp/d122 1
dgsmp/d121 1 5 d6s0 d7s0 d8s0 d18s0 d19s0 -i 32b
dgsmp/d122 1 5 d9s0 d10s0 d11s0 d12s0 d13s0 -i 32b
messages:
May 7 02:05:46 sdp03 scsi: [ID 107833 kern.warning] WARNING: /pci@1c,600000/scsi@1/sd@9,0 (sd99):
May 7 02:05:46 sdp03 Error for Command: read(10) Error Level: Retryable
May 7 02:05:46 sdp03 scsi: [ID 107833 kern.notice] Requested Block: 12466784 Error Block: 12466803
May 7 02:05:46 sdp03 scsi: [ID 107833 kern.notice] Vendor: SEAGATE Serial Number: 055033E9RM
May 7 02:05:46 sdp03 scsi: [ID 107833 kern.notice] Sense Key: Media Error
May 7 02:05:46 sdp03 scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xf
iostat -E:
sd99 Soft Errors: 0 Hard Errors: 730 Transport Errors: 91
Vendor: SEAGATE Product: ST373207LSUN72G Revision: 045A Serial No: 5033E9RM
Size: 73.40GB <73400057856 bytes>
Media Error: 636 Device Not Ready: 0 No Device: 94 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
diskinfo:
c7t9d0 SEAGATE ST373207LSUN72G 045A 055033E9RM
scdidadm-L:
13 sdp03:/dev/rdsk/c7t9d0 /dev/did/rdsk/d13
13 sdp04:/dev/rdsk/c7t9d0 /dev/did/rdsk/d13
分析结论:
SE3320 一台 所有硬盘Needs maintenance
Se3320 另一台有一块故障硬盘c7t9(/dev/did/rdsk/d13 )状态Last Erred
处理思路:
将Needs maintenance 的Se3320与另一台okey的分别同步,恢复成okey;
将Last Erred 的故障硬盘认为是正常的,尝试与d19s0 同步,估计同步会报I/O读写错误自动中断;
如果同步失败,将该盘换掉,与d19s0再同步成okey,再将数据倒回来。
预计d13s0数据会丢失,进行raid重建,并根据事先备份恢复数据
根据"/node@1/pci@1c,600000/scsi@1/sd@9,0" 99 "sd" 以及查询scsi卡端口物理路径,scsi cable连接,能判断出哪台3320有坏盘。
问题:
1 处理思路是否可行?因为是raid0,更换一块硬盘会导致数据丢失,能否根据另一台3320 mirror保证数据不丢?
2 Se3320 (JBOD)打开面板后,看不到硬盘序列号,如何来判断故障硬盘是哪个?硬盘状态灯都是正常绿的。 |
|