- 论坛徽章:
- 0
|
小弟最近出差,干了个活,或许是幸运,或许是水平有限,把过程贴出来,给各位大大看看,多指点\r\n\r\n奇遇,3310的恢复\r\n\r\n硬件配置:sun880*2,各有72G*6;\r\n sun StorEdge3310*1,单控制卡,有36G*6;\r\n连接方式:单通道,双主机\r\nraid做法:8,9,10,11,12做成raid5,13作hot spare\r\n软件配置:Solaris 9\r\n\r\n问题描述:3310阵列有蜂鸣报警,***-反复,LED正常,无黄灯亮;\r\n 2号主机不能访问阵列,有I/O ERROR;\r\n 1号主机对阵列的访问断断续续;\r\n 数据库不能读写。\r\n\r\n处理过程:\r\n1、安装sccli软件\r\n1) # cp 2.0.0_sw_solaris-sparc.zip /tmp\r\n2) #unzip 2.0.0_sw_solaris-sparc.zip\r\n3) # cd solaris/sparc\r\n4) #pkgadd -d . SUNWsscs\r\n\r\n2、用sccli察看阵列情况\r\n#sccli selected device /dev/rdsk/c2t0d0s2 [SUN StorEdge 3310 SN#080CD3]\r\nsccli>\r\nsccli> show logical-drives\r\nLD LD-ID Size Assigned Type Disks Spare Failed Status\r\n--------------------------------------------------------\r\nld0 0D839135 134.67GB Primary RAID5 3 0 1 Dead\r\nsccli>\r\nsccli> show disks\r\nCh Id Size Speed LD Status IDs Rev\r\n--------------------------------------------------------\r\n 0 8 33.92GB 160MB ld0 ONLINE SEAGATE ST336607LSUN36G 0507\r\n S/N 3JA72NYV00007429\r\n 0 9 33.92GB 160MB NONE USED SEAGATE ST336607LSUN36G 0507\r\n S/N 3JAY46Q500007423\r\n 0 10 N/A N/A NONE BAD SEAGATE ST336607LSUN36G 0507\r\n S/N 3JA1ESPC00007340\r\n 0 11 33.92GB 160MB ld0 ONLINE SEAGATE ST336607LSUN36G 0507\r\n S/N 3JA1EF8J00007340\r\n 0 12 33.92GB 160MB ld0 ONLINE SEAGATE ST336607LSUN36G 0507\r\n S/N 3JAY483500007325\r\n 0 13 33.92GB 160MB NONE USED HITACHI DK32EJ36NSUN36G PQ0B\r\n S/N 49S1HVFA0040A3BR\r\nsccli>\r\nsccli> show events\r\n\r\nThu Jan 27 17:41:19 2005\r\n[0181] #1: StorEdge Array SN#27556 Controller NOTICE: controller initialization completed\r\n\r\nTue Nov 29 13:48:36 2005\r\n[1113] #2: StorEdge Array SN#27556 CH0 ID10: SCSI Drive ALERT: bad block encountered (02h, 03h,0C/00)\r\nSCSI Status:0x02 Sense Key:0x03 Sense Code:0x0c Sense Code Qualifier:0x00\r\n\r\nTue Nov 29 13:48:42 2005\r\n[1117] #3: StorEdge Array SN#27556 CH0 ID10: SCSI Drive ALERT: block successfully reassigned\r\nSCSI Status:0x00 Sense Key:0x00 Sense Code:0x00 Sense Code Qualifier:0x00\r\n\r\nTue Nov 29 14:41:37 2005\r\n[1113] #4: StorEdge Array SN#27556 CH0 ID10: SCSI Drive ALERT: bad block encountered (02h, 03h,0C/00)\r\nSCSI Status:0x02 Sense Key:0x03 Sense Code:0x0c Sense Code Qualifier:0x00\r\n\r\nTue Nov 29 14:41:40 2005\r\n[1116] #5: StorEdge Array SN#27556 CH0 ID10: SCSI Drive ALERT: block reassignment failed\r\nSCSI Status:0x02 Sense Key:0x03 Sense Code:0x0c Sense Code Qualifier:0x00\r\n\r\nTue Nov 29 14:41:40 2005\r\n[2101] #6: LD-ID 0D839135 on StorEdge Array SN#27556: ALERT: SCSI drive failure (CH0 ID10)\r\n\r\nThu Dec 1 12:03:18 2005\r\n[110F] #7: StorEdge Array SN#27556 CH0: SCSI Drive Channel ALERT: SCSI bus reset issued\r\nSCSI Status:0x00 Sense Key:0x00 Sense Code:0x00 Sense Code Qualifier:0x00\r\n\r\n==========================\r\n\r\n从上述情况来看,分析如下:\r\n1) Ch0 ID 10 failed , need replace\r\n\r\n2) Ch0 ID 9 , ID13 status \" NONE USED \" , raid 5 \" DEAD \"\r\n\r\n我觉得9号和13号盘有修复的可能性,如能够修复,raid5还有希望,数据还有可能保存,就好比:干,九死一生,不干,十死无生。\r\n\r\n3、修复\r\nsccli> show ip-address\r\n192.1.1.68\r\n通过网线登陆telnet 192.1.1.68\r\n选择VT100模式,ctl+L刷新屏幕\r\n- go \"view and edit SCSI Drives\"\r\n- select ID9 (NONE/USED)\r\n- select \"add Global spare\"\r\n- designate ID9 global spare\r\n- go to \"view and edit configuration parameters\"\r\n--> select Disk Array parameters\r\n--> select rebuild priority and set it to \"Normal\" or \"improved\", \r\n(when selecting \"improved\" , their I/O may beimpacted)\r\n失败,ID9 BAD\r\n\r\n失望中,我突然发现 ID10的状态居然变成了online\r\n\r\n好机会阿,再试试ID13\r\n- go \"view and edit SCSI Drives\"\r\n- select ID13 (NONE/USED)\r\n- select \"add Global spare\"\r\n- designate ID13 global spare\r\n- go to \"view and edit configuration parameters\"\r\n--> select Disk Array parameters\r\n--> select rebuild priority and set it to \"Normal\" or \"improved\", \r\n(when selecting \"improved\" , their I/O may beimpacted)\r\n\r\nok,logical的状态变成了good,呵呵。\r\n\r\n叫来了备件,为了安全起见,叫来了3块,把9,10,13都换掉了。 |
|