- 论坛徽章:
- 0
|
一. 机器故障现象
通过Storage Manager10进入到系统里面,“Recover from failures(小听诊器)”在闪动,点开后,发现里面报A、B控制器电池错误,即有效期已到,一般是3年。
准备基础
将笔记本和两个控制器后的以太口同时通过hub互相连接,默认以太口的地址为192.168.128.101 192.168.128.102 通过SM9软件登录阵列管理,必须要两个以太口同时连接并SM增加管理,保证在控制器切换时候链路不会出现中断。
通过查看系统profile 来判断当前硬盘的微码,登录SM,进入阵列管理后,选择STORAGE SUBSYSTEM--VIEW PROFILE,类似如下
PROFILE FOR STORAGE SUBSYSTEM: FAStT 600 Configured (5/23/08 9:53:30 AM)
DRIVES------------------------------
SUMMARY
Number of drives: 28
Supported drive types: Fibre (2
BASIC:
TRAY, SLOT STATUS CAPACITY CURRENT DATA RATE PRODUCT ID FIRMWARE VERSION
0, 1 Optimal 36.72 GB 2 Gbps B337 F454
0, 2 Optimal 36.72 GB 2 Gbps B337 F454
0, 3 Optimal 36.72 GB 2 Gbps B337 F454
查看硬盘的微码 fireware ,这里的微码有要求 ,如果硬盘微码存在问题,那么在替换电池后,将会出现硬盘故障。
总之建议对硬盘进行升级微码 。SM中升级微码 操作:ADVANCED--MAINTENANCE--DOWNLOAD--DRIVERFIREWARE/MODE 选择要升级的微码文件即可。
切切此时不可有操作在硬盘上!!
见IBM的说法如下。
Abstract: MULTIPLE HDDS MAY FAIL AFTER CONTROLLER REMOVAL/REPLACEMENT
SYMPTOM:
After replacing a DS4000 System Storage controller, due to a
battery replacement or controller failure, multiple hard disk
drive failures may occur.
PROBLEM ISOLATION AIDS:
- The system may be any of the following IBM servers:
DE4300 (FAStT600) Single Controller Storage Server, Type
1722, any model
DS4000 EXP710 Storage Enclosure, Type 1740, any model
DS4000 EXP810 Storage Enclosure, Type 1812, any model
DS4000 EXP810 Storage Enclosure, Type 1812 (DC power
supplies), any model
DS4300 (FAStT600) Dual Controller and Turbo Storage
Server, Type 1722, any model
DS4500 (FAStT900) Storage Server, Type 1742, any model
DS4700 Storage Server, Type 1814, any model
DS4700 Storage Server, Type 1814 (DC power supplies), any
model
DS4800 Storage Server, Type 1815, any model
- This tip is not software specific.
- The system is configured with one or more of the following
IBM Options:
146.8 GB, 10,000 rpm, hot-swappable, 2 Gbps Fibre Channel
disk drive module, Option 32P0765
300 GB, 10,000 rpm, hot-swappable, 2 Gbps Fibre Channel
disk drive module, Option 73P8005
73.4 GB, 10,000 rpm, hot-swappable, 2 Gbps Fibre Channel
disk drive module, Option 06P5762
- The JFQ3 or JFQ4 firmware for the hard disk drives is
affected.
- The system has the symptom described above.
FIX:
The issue is resolved in the JFQ8 hard disk drive firmware
upgrade. The update package can be found at the following
URL:
http://www.ibm.com/support/docview.wss?uid=psg1MIGR-5071008
DETAILS:
JFQ3 and JFQ4 hard disk drive firmware have sensitivity issues
to controller removal that was resolved in the JFQ8 hard disk
drive firmware. IBM highly recommends to upgrade to JFQ8
firmware level, whether or not drive fallout has occurred.
TRADEMARKS:
System Storage is a trademark of International Business
Machines Corporation in the United States, other countries, or
both.
Other company, product, or service names may be trademarks or
service marks of others.
二. 更换步骤
备份数据,假设阵列里面有两个逻辑盘LUN,分别放在A控制器和B控制器。由于此时电池没电,但是AC还能提供电源,所以可以进行备份,
更换电池的主要原因是由于如果此时掉电,ctroller里的raid控制信息将会丢失。系统将无法识别到lun。造成故障。
(建议在下电的情况下进行操作,当然理论上可以不下电进行操作,以下是不下电进行操作的步骤)
先在B控制器里操作(带电热插拔控制器):
1. 在SM里面,用鼠标右键点Arr2>>2(逻辑盘2)>>change>>OWNERSHIP/PREFERrED PATH》》controller in slot A,将会将lUN切换到控制器A
2. 系统弹出菜单,询问是否具备切换条件,主要包括1、当前lun不能有I/O操作,2系统上有冗余路径连接两个控制器。 否则会导致故障,选择yes
3. 如果没有冗余路径的话,需要手工在系统上进行识别并挂载。同时观察阵列中ARR2--2lun是否已经被A控制器接管。
4、 取下来B控制器,更换电池,把控制器B再插入原来位置;
5、 刚放入电池后,电池可能会长时间处于充电状态,可能需要5分钟---6个小时。不影响下一步。
6. 使用类似方法同1--3,将ARR2-2lun切换回B控制器,在SM里看到逻辑盘现在B控制器里面。;
7. B控制器电池更换完毕。
同理然后在A控制器里操作(带电热插拔控制器):
其上4步骤的操作视频如下
http://www.ibm.com/servers/storage/support/videos/RAID_new.swf
http://www.ibm.com/servers/storage/support/videos/battery_replacement.swf
三 结束
上述步骤完成后,到SM软件的右边温度计图标中-- 选择电池--选择reset,重新对电池进行计时。
恢复后,主机上检查pv情况。 |
|