- 论坛徽章:
- 0
|
Problem Determiantion and Recovery问题诊断及恢复
如果问题出现涉及到disk array和相关的pdisks,使用下面的方法确认问题:
I.error log analysis(ELA)中提到的信息
II.使用DisplayHardware Erro Report diagnostic task查看Hardwre error logs
III.通过PCI-X SCSI Disk Array Manager显示disk array hdisk和pdisk状态
Error Log Analysis(ELA)分析卡上出现的错误,修正错误的推荐的操作。为了深入确认什么操作可以被用来解决问题,有时推荐执行Maintenance Analysis Procedure(MAP)。本章提供许多这样MAPs。
本章节中包括的MAPs只解决直接与disk array和SCSI bus相关的问题。MAPs与其他设备和卡相关的问题,按情况可以从其他系统文档中获得
*****considerations*****
使用这些问题诊断与恢复步骤之前需要注意下面几点
1)如果一个disk array已经被作为启动设备使用,系统由于可能的disk-array问题导致启动失败,可以使
用Standalone Diagnostic media启动。Error Log Analysis,AIX error logs,PCI-X SCSI
Disk Array Manager和其他可以从Standalone Diagnostics获得的工具可以保住诊断和解决disk
array的问题
2)当为PCI-X SCSI RAID控制器调用diagnostic routines过程中,推荐使用Problem Determination
(PD)模式,而不是使用System Verification(SV)模式,除非有特定的原因要求使用SV模式(例如,MAP
指导使用SV模式)。
3)当PCI-X SCSI RAID控制器的diagnostic routines处于SV模式时,推荐使用在PD模式下进行
diagnostics来确保新的错误可以被分析。这些操作在使用standalone diagnostic盘的特别要执行。
*****Location Codes*****
hardware error logs中包含很多位置信息,并通过AIX location codes格式显示
对于SCSI设备AIX location code 格式是AB-CD-EF-G,H.
I) AB-CD标识PCI-X SCSI RAID控制器。
II) EF标识设备attached的控制器的SCSI bus。对于一个disk array,这个值是ff,表明是RAID设备
的逻辑bus。
III)G,H标识设备的SCSI ID和Logical Unit Number(LUN)
经常会看到用AIX location code的一部分来显示,例如:
I) AB-CD标识PCI-X SCSI RAID控制器
II) AB-CD-EF标识PCI-X SCSI RAID控制器的SCSI bus
III)EF-G,H标识SCSI bus,SCSI ID和attached disk的LUN
*****Identifying the Disk Array Problem*****
一个disk array问题用Service Request Number(SRN)唯一标示。SRN的格式是nnnn-rrrr,破折号前的SRN前四个数字作为Failing Function Code(FFC,例如2523),破折号后的SRN的后四个数字作为reason code。reason code标识发生的特定问题,为了确认哪个MAP使用必须获得它。
通过Error Log Analysis可以获得SRN,它将指导你到本章中相应的MAPs。要从AIX error log中获得reason code(SRN后四位数字),查看"Finding an SRN Given an AIX Error Log".
SRN描述了被检测到的问题和需要被考虑的标识问题的最主要原因。然而,通过Error Log Analysis中的描述,PCI-X SCSI Disk Array Manager中的List PCI-X SCSI Disk Array Configuration选项标识问题及确认问题时也有作用。更多关于PCI-X Disk Array Manager的信息,查看”Using the PC-X SCSI Disk Array Manager".
获得SRN,进入下一个章节获得更多具体问题描述并决定使用哪个MAP。
Service Request Number(SRN) Table
使用从Eroor Log Analysis或从AIX error log中获得的SRN,使用下面的表格去确认使用哪个MAP。对已MAPs,单看“Maintenance analysis Procedures(MAPs)"。
注:下面的表格只包括文档中与Maintenace Analysis Procedures相关的SRNs。完整的SRNs可以从
RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems
中获得。
*****Maintenance Analysis Procedures (MAPs) *****
Use the following procedures to resolve adapter, cache, or disk array problems associated with a PCI-X SCSI RAID Controller. See “Service Request Number (SRN) Table”to identify which MAP to use.
MAP 3000
Use this MAP to resolve the following problems:
1) Permanent Cache Battery Pack failure (SRN nnnn - 8008)
2) Impending Cache Battery Pack failure (SRN nnnn - 8009)
Step 3000-1
Prior to replacing the Cache Battery Pack, it must be forced into an error state. This will ensure that write caching is stopped prior to replacing the battery pack thus preventing possible data loss.
1. Follow the steps described in “Forcing a Rechargeable Battery Error”
2. Go to Step 3000-2
Step 3000-2
Follow the steps described in “Replacing the Rechargeable Cache Battery Pack”
When the problem is resolved, then go to MAP 0410: Repair Checkout, in RS/6000 pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3010
Use this MAP to resolve the following problems:
1)Incompatible disk installed at the degraded disk location in Disk Array (SRN nnnn - 9025)
2)Disk Array is degraded due to a missing or failed disk (SRN nnnn -9030)
3)Automatic reconstruction initiated for a Disk Array (SRN nnnn - 9031)
4)Disk Array is degraded due to a missing or failed disk (SRN nnnn - 9032)
Step 3010-1
Identify the disk array by examining the hardware error log. The hardware error log may be viewed as follows:
1.Start Diagnostics and select Task Selection on the Function Selection screen.
2.Select Display Hardware Error Report.
3.Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4.Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5.On the Error Summary screen, look for an entry with a SRN corresponding to the problem that sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6.Select the hardware error log to view. This error log displays the following disk array information under the Array Information heading: Location, S/N(serial number), and RAID Level.
7.Go to Step 3010-2.
Step 3010-2
View the current disk array configuration as follows:
1.Start the PCI-X SCSI Disk Array Manager.
a.Start Diagnostics and select Task Selection on the Function Selection screen.
b.Select RAID Array Manager.
c.Select PCI-X SCSI Disk Array Manager.
2.Select List PCI-X SCSI Disk Array Configuration.
3.Select the PCI-X SCSI RAID Controller identified in the hardware error log.
4.Go to Step 3010-3.
Step 3010-3
Does a disk array have a state of Degraded?
NO Go to Step 3010-4.
YES Go to Step 3010-5.
Step 3010-4
The affected disk array should have a state of either Rebuilding or Optimal due to the use of a hot spare disk. Identify the Failed disk, which is no longer a part of the disk array, by finding the pdisk listed at the bottom of the screen that has a state of either Failed or RWProtected. Using appropriate service procedures, such as use of the SCSI and SCSI RAID Hot Plug Manager, remove the failed disk and replace it with a new disk to use as a hot spare:
1.Start the PCI-X SCSI Disk Array Manager.
a.Start Diagnostics and select Task Selection on the Function Selection screen.
b.Select RAID Array Manager.
c.Select PCI-X SCSI Disk Array Manager.
2.Select Diagnostics and Recovery Options .
3.Select SCSI and SCSI RAID Hot Plug Manager.
4.Select Identify a Device Attached to an SCSI Hot Swap Enclosure Device.
5.Choose the location for the device you want to remove or install.
Note:The visual indicator on the Device will blink at the Identify rate.
6.If you are removing a device:
a.Select Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure Device.
Note:The visual indicator on the device will blink at the Remove rate.
b.Remove the device.
else, if installing a device:
a.Select Attach a Device to an SCSI Hot Swap Enclosure Device.
Note:The visual indicator on the device will blink at the Remove rate.
b.Insert the device.
If a new disk is not listed as a pdisk, it may first need to be prepared for use in a disk array. Do the following:
1.Start the PCI-X SCSI Disk Array Manager
a.Start Diagnostics and select Task Selection on the Function Selection screen.
b.Select RAID Array Manager.
c.Select PCI-X SCSI Disk Array Manager.
2.Select Create an Array Candidate pdisk and Format to 522 Byte Sectors.
3.Select the PCI-X SCSI RAID Controller.
4.Select the disk(s) from the list that you want to prepare for use in the disk arrays.
In order to make the new disk usable as a hot spare, do the following:
1.Start the PCI-X SCSI Disk Array Manager.
a.Start Diagnostics and select Task Selection on the Function Selection screen.
b.Select RAID Array Manager.
c.Select PCI-X SCSI Disk Array Manager.
2.Select Change/Show PCI-X SCSI pdisk Status.
3.Select Create a Hot Spare.
4.Select the PCI-X SCSI RAID Controller.
5.Select the pdisk that you want to designate as a hot spare.Hot spare disks are useful only if their capacity is greater than or equal to that of the smallest
Note:capacity disk in a disk array that becomes Degraded. When the problem is resolved, go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
Step 3010-5
Identify the failed disk by finding the pdisk listed for the degraded disk array that has a state of Failed. Using appropriate service procedures, such as use of the SCSI and SCSI RAID Hot Plug Manager,
remove the failed disk and replace it with a new disk to use in the disk array. The SCSI and SCSI RAID Hot Plug Manager can be invoked as follows:
1.Start the PCI-X SCSI Disk Array Manager.
a.Start Diagnostics and select Task Selection on the Function Selection screen.
b.Select RAID Array Manager.
c.Select PCI-X SCSI Disk Array Manager.
2.Select Diagnostics and Recovery Options.
3.Select SCSI and SCSI RAID Hot Plug Manager.
4.Select Identify a Device Attached to an SCSI Hot Swap Enclosure Device.
5.Choose the location for the device you wish to remove/install.
Note:The visual indicator on the device will blink at the Identify
rate.
6.If removing a device:
a.Select Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure Device.
Note: The visual indicator on the device will blink at the Remove rate.
b.Remove the device.
else, if installing a device:
a.Select Attach a Device to an SCSI Hot Swap Enclosure Device.
Note:The visual indicator on the device will blink at the Remove rate.
b.Insert the device.
To bring the disk array back to a state of Optimal, do the following:
1.Start the PCI-X SCSI Disk Array Manager.
a.Start Diagnostics and select Task Selection on the Function Selection screen.
b.Select RAID Array Manager.
c.Select PCI-X SCSI Disk Array Manager.
2.Select Reconstruct a PCI-X SCSI Disk Array.
3.Select the failed pdisk to reconstruct.
Note: The replacement disk should have a capacity that is greater than or equal to that of the smallest capacity disk in the degraded disk array. When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3011
Use this MAP to resolve the following problems:
1)Two or more disks are missing from a RAID-5 or RAID-6 Disk Array (SRN nnnn - 9020 / nnnn - 9021 / nnnn - 9022)
2)One or more disk pairs are missing from a RAID-10 Disk Array (SRN nnnn - 9060)
3)One or more disks are missing from a RAID-0 Disk Array (SRN nnnn -9061 / nnnn - 9062)
Step 3011-1
Identify the disks missing from the disk array by examining the hardware error log. The hardware error log may be viewed as follows:
1.Diagnostics and select Task Selection on the Function Selection screen.
2.Select Display Hardware Error Report.
3.Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4.Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5.On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6.Select the hardware error log to view. Viewing the hardware error log, the missing disks are those listed under Array Member Information with an Actual Location of *unkwn*.
7.Go to Step 3011-2.
Step 3011-2
There are three possible ways to correct the problem. Perform only one of the following three options, listed in the order of preference:
1.Locate the identified disks and install them in the correct physical locations (that is the Expected
Locations) in the system. Perform only one of the following two options:
– IPL the system or logical partition
– Unconfigure and reconfigure the adapter by performing the following:
1.Unconfigure the adapter.
a.Start the PCI-X SCSI Disk Array Manager.
1)Start Diagnostics and select Task Selection on the Function Selection screen.
2)Select RAID Array Manager.
3)Select PCI-X SCSI Disk Array Manager.
b.Select Diagnostics and Recovery Options.
c.Select Unconfigure an Available PCI-X SCSI RAID Controller.
2.Configure the adapter.
a.Start the PCI-X SCSI Disk Array Manager
1)Start Diagnostics and select Task Selection on the Function Selection screen.
2)Select RAID Array Manager.
3)Select PCI-X SCSI Disk Array Manager.
b.Select Diagnostics and Recovery Options.
c.Select Configure a Defined PCI-X SCSI RAID Controller.
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
2.Delete the disk array, as follows:
Attention: All data on the disk array will be lost.
1.Start the PCI-X SCSI Disk Array Manager.
a.Start Diagnostics and select Task Selection on the Function Selection screen.
b.Select RAID Array Manager.
c.Select PCI-X SCSI Disk Array Manager.
2.Select Delete a PCI-X SCSI Disk Array.
3.Select the PCI-X SCSI RAID Controller.
4.Select the disk array to delete.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
Format the remaining members of the disk array, as follows
Attention: All data on the disk array will be lost.
1.Start the PCI-X SCSI Disk Array Manager.
a.Start Diagnostics and select Task Selection on the Function Selection screen.
b.Select RAID Array Manager.
c.Select PCI-X SCSI Disk Array Manager.
2.Select Diagnostics and Recovery Options.
3.Select Format Physical Disk Media (pdisk).
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3012
Use this MAP to resolve the following problem:
One or more Disk Array Members are not at required physical locations (SRN nnnn - 9023)
Step 3012-1
Identify the disks which are not at their required physical locations by examining the hardware error log.
The hardware error log may be viewed as follows:
1.Start Diagnostics and select Task Selection on the Function Selection screen.
2.Select Display Hardware Error Report.
3.Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4.Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5.On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has
occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be
ignored, however, this MAP may need to be used multiple times if the same problem has
occurred on multiple entities.
6.Select the hardware error log to view.
Viewing the hardware error log, the disks which are not at their required locations are those listed under Array Member Information with an Expected Location and Actual Location which do not match.
Note: An Actual Location of *unkwn* is acceptable, and no action is needed to correct it. This *unkwn*
location should only occur for the disk array member that corresponds to the Degraded Disk S/N.
7.Go to Step 3012-2.
Step 3012-2
There are three possible ways to correct the problem. Perform only one of the following three options, listed in the order of preference:
* Locate the identified disks and install them in the correct physical locations (that is the Expected
Locations) in the system. Perform only one of the following two options:
– IPL the system or logical partition
– Unconfigure and reconfigure the adapter by performing the following:
1. Unconfigure the adapter.
a. Start the PCI-X SCSI Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Unconfigure an Available PCI-X SCSI RAID Controller.
2. Configure the adapter.
a. Start the PCI-X SCSI Disk Array Manager
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Configure a Defined PCI-X SCSI RAID Controller.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
* Delete the disk array, as follows:
Attention: All data on the disk array will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Delete a PCI-X SCSI Disk Array.
3. Select the PCI-X SCSI RAID Controller.
4. Select the disk array to delete.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
* Format the remaining members of the disk array, as follows
Attention: All data on the disk array will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Format Physical Disk Media (pdisk).
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3013
Use this MAP to resolve the following problem:
Disk array is or would become degraded and parity data is out of synchronization (SRN nnnn - 9027)
Step 3013-1
Identify the adapter and disks by examining the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, if the disk array member which corresponds to the Degraded Disk S/N has an Actual Location of *unkwn* and is not physically
present, it may be helpful to find this disk.
7. Go to Step 3013-2.
Step 3013-2
Have the adapter card or disks been physically moved recently?
NO Contact your service support organization.
YES Go to Step 3013-3.
Step 3013-3
There are three possible ways to correct the problem. Perform only one of the following three options,
listed in the order of preference:
* Restore the adapter and disks back to their original configuration. Perform onlyone of the following two
options:
–IPL the system or logical partition
–Unconfigure and reconfigure the adapter by performing the following:
1. Unconfigure the adapter.
a. Start the PCI-X SCSI Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Unconfigure an Available PCI-X SCSI RAID Controller.
2. Configure the adapter.
a. Start the PCI-X SCSI Disk Array Manager
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Configure a Defined PCI-X SCSI RAID Controller.
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
* Delete the disk array, as follows:
Attention: All data on the disk array will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Delete a PCI-X SCSI Disk Array.
3. Select the PCI-X SCSI RAID Controller.
4. Select the disk array to delete.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
* Format the remaining members of the disk array, as follows
Attention: All data on the disk array will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Format Physical Disk Media (pdisk).
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3020
Use this MAP to resolve the following problem:
Cache data associated with attached disks cannot be found (SRN nnnn - 9010)
Step 3020-1
Has the server been powered off for several days?
NO Go to Step 3020-2.
YES Go to Step 3020-5.
Step 3020-2
Note: Label all parts (original and new) before moving them around.
Using the appropriate service procedures, remove the I/O adapter. Install the new replacement storage I/O adapter with the following parts installed on it:
* The cache directory card from the original storage I/O adapter. (Refer to “Replacing the Cache Directory
Card” on page 40.
* The removable cache card from the original storage I/O adapter (This only applies to some 2780 I/O adapters. Refer to “Separating a Removable Cache Card From the Base Card” on page 30.
Step 3020-3
Has a new SRN of nnnn-9010 or nnnn-9050 occurred?
NO Go to Step 3020-6.
YES Go to Step 3020-4.
Step 3020-4
Was the new SRN nnnn-9050?
NO The new SRN was nnnn-9010.
Reclaim the controller cache storage as follows:
Attention:
Data will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Reclaim Controller Cache Storage.
4. Select the PCI-X SCSI RAID Controller.
5. Confirm that you wish to proceed.
Note: On the Reclaim Controller Cache Storage results screen, the number of lost sectors is displayed. If the number is 0, there is no data loss. If the number is not 0, data has been lost and the
system operator may want to restore data after this procedure is completed.
6. Go to Step 3020-6.
YES Contact your Service Support organization
Step 3020-5
If the server has been powered off for several days after an abnormal power-down, the cache battery pack may be depleted. Do not replace the adapter or the cache battery pack. Reclaim the controller cache
storage as follows:
Attention: Data will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Reclaim Controller Cache Storage.
4. Select the PCI-X SCSI RAID Controller.
5. Confirm that you wish to proceed.
Note: On the Reclaim Controller Cache Storage results screen, the number of lost sectors is displayed. If the number is 0, there is no data loss. If the number is not 0, data has been lost and the system operator may want to restore data after this procedure is completed.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems
Step 3020-6
1. Using the appropriate service procedures, remove the I/O adapter. Install the new replacement storage I/O adapter with the following parts installed on it:
* The cache directory card from the new storage I/O adapter. (Refer to “Replacing the Cache Directory Card”.
* The removable cache card from the new storage I/O adapter (This only applies to some 2780 I/O adapters. Refer to “Separating a Removable Cache Card From the Base Card”.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems
MAP 3021
Use this MAP to resolve the following problem:
RAID controller resources not available due to previous problems (SRN nnnn - 9054)
Step 3021-1
Perform the following:
1. Remove any new or replacement disks which have been attached to the adapter.
2. Take action on the other errors which have occurred at the same time as this error.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries
Diagnostic Information for Multiple Bus Systems.
MAP 3030
Use this MAP to resolve the following problem:
Controller does not support function expected by one or more disks (SRN nnnn - 9008)
Step 3030-1
Identify the affected disks by examining the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, the Device Errors Detected field indicates the total number of disks which are affected. The Device Errors Logged field indicates the
number of disks for which detailed information is provided. Under the Device heading, the Location, Vendor/Product ID, and S/N are provided for up to three disks. Additionally, the Controller Type and S/N for each of these disks indicates the adapter to which the disk was last attached when it was operational.
7. Go to Step 3030-2.
Step 3030-2
Have the adapter card or disks been physically moved recently?
NO Contact your service support organization.
YES Go to Step 3030-3.
Step 3030-3
There are two possible ways to correct the problem. Perform only one of the following two options, listed in the order of preference:
* Restore the adapter and disks back to their original configuration. Perform onlyone of the following two options:
– IPL the system or logical partition
– Unconfigure and reconfigure the adapter by performing the following:
1. Unconfigure the adapter.
a. Start the PCI-X SCSI Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Unconfigure an Available PCI-X SCSI RAID Controller.
2. Configure the adapter.
a. Start the PCI-X SCSI Disk Array Manager
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Configure a Defined PCI-X SCSI RAID Controller.
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
* Format the disks, as follows:
Attention: All data on the disk array will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection creen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Format Physical Disk Media (pdisk).
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3031
Use this MAP to resolve the following problem:
Required cache data cannot be located for one or more disks (SRN nnnn - 9050)
Step 3031-1
Did you just exchange the adapter as the result of a failure?
NO Go to Step 3031-3.
YES Go to Step 3031-2.
Step 3031-2
Note: The failed adapter that you have just exchanged contains cache data that is required by the disks that were attached to that adapter. If the adapter that you just exchanged is failing intermittently,
reinstalling it and IPLing the system may allow the data to be successfully written to the disks. After the cache data is written to the disks and the system is powered off normally, the adapter can be replaced without data being lost. Otherwise, continue with this procedure.
Note: Label all parts (old and new) before moving them around.Using the appropriate service procedures, remove the I/O adapter. Install the new replacement storage I/O adapter with the following parts installed on it:
* The cache directory card from the original storage I/O adapter. (Refer to “Replacing the Cache Directory Card”.
* The removable cache card from the original storage I/O adapter (This only applies to some 2780 I/O adapters. Refer to “Separating a Removable Cache Card From the Base Card”.
Go to Step 3031-8.
Step 3031-3
Identify the affected disks by examining the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, the Device Errors Detected field indicates the total number of disks which are affected. The Device Errors Logged field indicates the number of disks for which detailed information is provided. Under the Device heading, the Location, Vendor/Product ID, and S/N are provided for up to three disks. Additionally, the Controller Type and S/N for each of these disks indicates the adapter to which the disk was last attached when it was operational.
7. Go to Step 3031-4.
Step 3031-4
Have the adapter card or disks been physically moved recently?
NO Contact your service support organization.
YES Go to Step 3031-5.
Step 3031-5
Is the data on the disks needed for this or any other system?
NO Go to Step 3031-7.
YES Go to Step 3031-6.
Step 3031-6
The adapter and disks, identified above, must be reunited so that the cache data can be written to the disks. Restore the adapter and disks back to their original configuration. Once the cache data is written to the disks and the system is powered off normally, the adapter and/or
disks may be moved to another location.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
Step 3031-7
There are three possible ways to correct the problem. Perform only one of the following three options, listed in the order of preference:
* Reclaim Controller Cache Storage by performing the following:
Attention: All data on the disk array will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Reclaim Controller Cache Storage.
4. Select the PCI-X SCSI RAID Controller.
5. Confirm that you will Allow Unknown Data Loss.
6. Confirm that you wish to proceed.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
* If the disks are members of a disk array, delete the disk array.
Attention: All data on the disk array will be lost.
This may be performed as follows:
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Delete a PCI-X SCSI Disk Array.
3. Select the PCI-X SCSI RAID Controller.
4. Select the disk array to delete.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
* Format the disks, as follows:
Attention: All data on the disks will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Format Physical Disk Media (pdisk).
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
Step 3031-8
Has a new SRN nnnn-9010 or nnnn-9050 occurred?
NO Go to Step 3031-10.
YES Go to Step 3031-9.
Step 3031-9
Was the new SRN nnnn-9050?
NO The new SRN was nnnn-9010.
Reclaim the controller cache storage as follows:
Attention:Data will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Reclaim Controller Cache Storage.
4. Select the PCI-X SCSI RAID Controller.
5. Confirm that you wish to proceed.
Note: On the Reclaim controller cache storage results screen, the number of lost sectors is displayed. If the number is 0, there is no data loss. If the number is not 0, data has been lost and the system
operator may want to restore data after this procedure is completed.
6. Go to Step 3031-10.
YES Contact your Service Support organization
Step 3031-10
Using the appropriate service procedures, remove the I/O adapter. Install the new replacement storage I/O adapter with the following parts installed on it:
* The cache directory card from the new storage I/O adapter. (Refer to “Replacing the Cache Directory Card”.
* The removable cache card from the NEW storage I/O adapter (This only applies to some 2780 I/O adapters. Refer to “Separating a Removable Cache Card From the Base Card”.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3032
Use this MAP to resolve the following problem:
Cache data exists for one or more missing or failed disks (SRN nnnn -9051)
The possible causes are:
1) One or more disks have failed on the adapter.
2) One or more disks were either moved concurrently or were removed after an abnormal power off.
3) The adapter was moved from a different system or a different location on this system after an abnormal power off.
4) The cache of the adapter was not cleared before it was shipped to the customer.
Step 3032-1
Identify the affected disks by examining the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, the Device Errors Detected field indicates the total number of disks which are affected. The Device Errors Logged field indicates the number of disks for which detailed information is provided. Under the Device heading, the Location, Vendor/Product ID, and S/N are provided for up to three disks. Additionally, the Controller Type and S/N for each of these disks indicates the adapter to which the disk was last attached when it was operational.
7. Go to Step 3030-2.
Step 3032-2
Are there other disk or adapter errors which have occurred at about the same time as this error?
NO Go to Step 3032-3.
YES Go to Step 3032-6.
Step 3032-3
Is the data on the disks (and thus the cache data for the disks) needed for this or any other system?
NO Go to Step 3032-7.
YES Go to Step 3032-4.
Step 3032-4
Have the adapter card or disks been physically moved recently?
NO Contact your service support organization.
YES Go to Step 3032-5.
Step 3032-5
The adapter and disks must be reunited so that the cache data can be written to the disks. Restore the adapter and disks back to their original configuration. After the cache data is written to the disks and the system is powered off normally, the adapter and/or disks may be moved to another location.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
Step 3032-6
Take action on the other errors that have occurred at the same time as this error. When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
Step 3032-7
Attention: Data will be lost.
Reclaim the Controller Cache Storage by performing the following:
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Reclaim Controller Cache Storage.
4. Select the PCI-X SCSI RAID Controller.
5. Confirm that you will Allow Unknown Data Loss.
6. Confirm that you wish to proceed.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3033
Use this MAP to resolve the following problems:
1) Disk has been modified after last known status (SRN nnnn - 9090)
2) Incorrect disk configuration change has been detected (SRN nnnn - 9091)
Step 3033-1
*Perform onlyone of the following two options:
- IPL the system or logical partition
- Unconfigure and reconfigure the adapter by performing the following:
1. Unconfigure the adapter.
a. Start the PCI-X SCSI Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Unconfigure an Available PCI-X SCSI RAID Controller.
2. Configure the adapter.
a. Start the PCI-X SCSI Disk Array Manager
1) Start Diagnostics and select Task Selection on the Function Selection creen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Configure a Defined PCI-X SCSI RAID Controller.
Take action on the other errors which have occurred at the same time as this error. When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3034
Use this MAP to resolve the following problem:
Disk requires Format before use (SRN nnnn - 9092)
The possible causes are:
1) Disk is a previously failed disk from a disk array and was automatically replaced by a hot spare disk.
2) Disk is a previously failed disk from a disk array and was removed and later reinstalled on a different adapter or different location on this adapter.
3) Appropriate service procedures were not followed when replacing disks or reconfiguring the adapter, such as not using the SCSI and SCSI RAID Hot Plug Manager when concurrently removing/installing disks or not performing a normal power down of the system prior to reconfiguring disks and adapters.
4) Disk is member of a disk array, but was detected subsequent to the adapter being configured.
5) Disk has multiple or complex configuration problems.
Step 3034-1
Identify the affected disks by examining the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be
ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, the Device Errors Detected field indicates the total number of disks which are affected. The Device Errors Logged field indicates the number of disks for which detailed information is provided. Under the Device heading, the Location, Vendor/Product ID, and S/N are provided for up to three disks. Additionally, the Controller Type and S/N for each of these disks indicates the adapter to which the disk was last attached when it was operational.
7. Go to Step 3030-2.
Step 3034-2
Are there other disk or adapter errors which have occurred at about the same time as this error?
NO Go to Step 3034-3.
YES Go to Step 3034-5.
Step 3034-3
Have the adapter card or disks been physically moved recently?
NO Go to Step 3034-4.
YES Go to Step 3034-6.
Step 3034-4
Is the data on the disks not needed for this or any other system and you wish to continue to use them with
this adapter?
NO Go to Step 3034-6.
YES Go to Step 3034-7.
Step 3034-5
Take action on the other errors which have occurred at the same time as this error.
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
Step 3034-6
Perform only one of the following actions that is most applicable to your situation:
v Perform only one of the following two options:
– IPL the system or logical partition
– Unconfigure and reconfigure the adapter by performing the following:
1. Unconfigure the adapter.
a. Start the PCI-X SCSI Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Unconfigure an Available PCI-X SCSI RAID Controller.
2. Configure the adapter.
a. Start the PCI-X SCSI Disk Array Manager
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Configure a Defined PCI-X SCSI RAID Controller.
Take action on the other errors which have occurred at the same time as this error.
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
v Restore the adapter and disks to their original configuration. Once this has been done, perform onlyon of the following two options:
– IPL the system or logical partition
– Unconfigure and reconfigure the adapter by performing the following:
1. Unconfigure the adapter.
a. Start the PCI-X SCSI Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Unconfigure an Available PCI-X SCSI RAID Controller.
2. Configure the adapter.
a. Start the PCI-X SCSI Disk Array Manager
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Configure a Defined PCI-X SCSI RAID Controller.
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries
Diagnostic Information for Multiple Bus Systems.
v Remove the disks from this adapter.
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries
Diagnostic Information for Multiple Bus Systems.
Step 3034-7
There are two possible ways to correct the problem. Perform only one of these options.
v Format the disks.
Attention: All data on the disks will be lost.
This may be performed as follows:
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Format Physical Disk Media (pdisk).
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
v If the disks are members of a disk array, delete the disk array by doing the following.
Attention: All data on the disk array will be lost.
Note: In some rare scenarios, deleting the disk array will not have no effect on a disk and the disk must be formatted instead.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Delete a PCI-X SCSI Disk Array.
3. Select the PCI-X SCSI RAID Controller.
4. Select the disk array to delete.
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3035
Use this MAP to resolve the following problem:
Disk media format bad (SRN nnnn - FFF3)
The possible causes are:
v Disk was being formatted and was powered off during this process.
v Disk was being formatted and was reset during this process.
Step 3035-1
Identify the affected disk by examining the hardware error log. The hardware error log may be viewed as
follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapter resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, under the Disk Information heading, the Location, Vendor/Product ID, and S/N are provided for the disk.
7. Go to Step 3035-2.
Step 3035-2
Format the disk by performing the following.
Attention: All data on the disks will be lost.
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select Format Physical Disk Media (pdisk).
When the problem is resolved then go to MAP 0410: Repair Checkout, inRS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3036
Use this MAP to resolve the following problem:
Identify disk to be replaced (SRN nnnn - 9200)
You are sent here when a pdisk (that is physical disk in 522 bytes/sector format) is to be replaced, however, the location of this disk was not provided.
Step 3036-1
Identify the failing disk by examining the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, under the Disk Information heading, the Location, Vendor/Product ID, and S/N are provided for the disk.
7. Go to Step 3036-2.
Step 3036-2
Using appropriate service procedures, such as use of the SCSI and SCSI RAID Hot Plug Manager, remove the failed disk and replace it with a new disk. The SCSI and SCSI RAID Hot Plug Manager can be invoked as follows:
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select Diagnostics and Recovery Options.
3. Select SCSI and SCSI RAID Hot Plug Manager.
4. Select Identify a Device Attached to an SCSI Hot Swap Enclosure Device.
5. Choose the location for the device you wish to remove or install
Note:The visual indicator on the Device will blink at the Identify rate.
6. If removing a device:
a. Select Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure Device
Note:The visual indicator on the Device will blink at the Remove rate.
b. Remove the device
if installing a device:
else
a. Select Attach a Device to an SCSI Hot Swap Enclosure Device
Note:The visual indicator on the Device will blink at the Remove rate.
b. Insert the device
7. Go to Step 3036-3.
Step 3036-3
1. Run diagnostics in system verification mode on the adapter
2. Take action on any other errors which may have surfaced due to removing the disk, if any, such as for degraded disk arrays
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3040
Use this MAP to resolve the following problem:
Identify a disk to be replaced (SRN nnnn - 9073)
Step 3040-1
There are two primary reasons for receiving this error (SRN nnnn - 9073).
1) An adapter in a multi-initiator and high-availability environment sees more than one other adapter connected in the configuration. Only two adapters are supported connected together in the multi-initiator and high-availability configuration. Work with the customer to identify and correct the invalid configuration.
2) Incompatible adapters are connected in a multi-initiator and high-availability environment. One or both of the adapters do not support attachment in a multi-initiator configuration. Verify the adapters logging this error are listed in Table 1 on page 1 with multi-initiator support mark as YES.
Determine which of these is the cause of your specific error and take the appropriate actions listed. If this does not correct the error, contact your next level of support.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 eServer pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3041
Use this MAP to resolve the following problem:
Multiple controllers not capable of similar functions or controlling the same set of devices (SRN nnnn - 9074).
Step 3041-1
To obtain the reason/description for this failure, you must find the formatted error information in the AIX error log. This should also contain information about the other connected adapter (Remote Adapter Fields).
Display the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. When viewing the hardware error log, the Detail Data section contains the “Reason for Failure” and “Remote Adapter” information.
Step 3041-2
Find the Reason for Failure and information for the other attached adapter (Remote Adapter) shown in the error log, and perform the action listed for the Reason in the following table:
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 eServer pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3050
Use the following to perform SCSI bus problem isolation.
Considerations:
1) Remove power from the system before connecting and disconnecting cables or devices, as appropriate, to prevent hardware damage or erroneous diagnostic results.
2) Some systems have SCSI and PCI-X bus interface logic integrated onto the system boards and use a pluggable RAID Enablement Card (a non-PCI form factor card) for these SCSI/PCI-X buses. An example of such a RAID Enablement Card is FC 5709. For these configurations, replacement of the RAID Enablement Card is unlikely to solve a SCSI bus-related problem because the SCSI bus interface logic is on the system board.
3) Some adapters provide two connectors, one internal and one external, for each SCSI bus. For this type of adapter, it is not acceptable to use both connectors for the same SCSI bus at the same time. SCSI bus problems are likely to occur if this is done. However, it is acceptable to use an internal connector for one SCSI bus and an external connector for another SCSI bus. The internal and external connectors are labeled to indicate which SCSI bus they correspond to.
4) When two adapters are connected in a Multi-Initiator and High-Availability configuration, as described in Chapter 4, “Multi-Initiator and High-Availability,” on page 23, each adapter’s SCSI ID must be set to a different value when connected to shared disk enclosures. If the SCSI IDs are not set properly, many SCSI bus problems can occur.
Attention: Replacing RAID adapters is not recommended without assistance from your service support organization when SCSI bus problems exist. Because the adapter may contain non-volatile write cache data and configuration data for the attached disk arrays, additional problems can be created by replacing a adapter when SCSI bus problems exist.
Attention: Removing functioning disks in a disk array is not recommended without assistance from your service support organization. A disk array may become degraded or failed if functioning disks are removed and additional problems may be created.
Common Device Removal and Installation Procedure
When this MAP calls for a device (that is disk, tape, CD-ROM, or DVD-ROM) to be removed or installed, use this common device removal and installation procedure:
If the boot device is not on this adapter and the device to be removed or installed is a hot-swap device, follow this procedure. If these conditions do not apply to your situation, see the else section located at the end of this procedure.
1. Remove or install the device by invoking the SCSI and SCSI RAID Hot Plug Manager.
a. Start the PCI-X Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select SCSI and SCSI RAID Hot Plug Manager.
d. Select Identify a Device Attached to an SCSI Hot Swap Enclosure Device.
e. Choose the location for the device you wish to remove or install.
Note: The visual indicator on the device will blink at the Identify rate.
f. If removing a device:
1) Select Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure Device.
Note: The visual indicator on the device will blink at the Remove rate.
2) Remove the device.
3) Label the device with the slot it was removed from to ensure it can be reinstalled in the same location.
if installing a device:
else
1) Select Attach a Device to an SCSI Hot Swap Enclosure Device.
Note: The visual indicator on the device will blink at the Remove rate.
2) Insert the device.
2. Unconfigure the adapter
a. Start the PCI-X SCSI Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Unconfigure an Available PCI-X SCSI RAID Controller.
3. Configure the adapter.
a. Start the PCI-X SCSI Disk Array Manager.
1) Start Diagnostics and select Task Selection on the Function Selection screen.
2) Select RAID Array Manager.
3) Select PCI-X SCSI Disk Array Manager.
b. Select Diagnostics and Recovery Options.
c. Select Configure a Defined PCI-X SCSI RAID Controller.
else:
1. Power off the system/logical partition
2. Remove/install the device(s)
3. Power on the system/logical partition
Step 3050-1
Identify the SCSI bus which the problem is occurring on by examining the hardware error log. The
hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapter resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, under the Disk Information heading, the Location field can be used to identify which SCSI bus the error is associated with (that is
the SCSI bus is the value EF given a location of EF-G,H).
7. Go to Step 3050-2.
Step 3050-2
Have recent changes been made to the SCSI configuration?
NO Go to Step 3050-5.
YES Go to Step 3050-3.
Step 3050-3
Check for the following problems:
1) Address conflicts between devices
2) Cabling problems such as; configurations that exceed the maximum cable lengths, missing termination, or excessive termination
3) Both internal and external connectors for this SCSI bus are being used at the same time (only one should have a cable attached)
4) Ensure the SCSI bus does not have multi-initiators (for example, set up for a high-availability configuration).
Note: Multi-initiator and High Availability (for example, HACMP) support is not provided at this time.For more details about supported SCSI cabling, refer to RS/6000 Eserver pSeries Adapters, Devices, and Cable Information for Multiple Bus Systems.
Did you find a problem?
NO Go to Step 3050-5.
YES Go to Step 3050-4.
Step 3050-4
1. Power off the system or logical partition.
2. Correct the problem.
3. Power on the system or logical partition, and run diagnostics in system verification mode on the adapter.
Did a SCSI bus-related failure occur?
NO Go to Step 3050-16.
YES Go to Step 3050-5.
Step 3050-5
Is problem related to the thermal fuse (that is SRN nnnn-719)?
NO Go to Step 3050-7.
YES Go to Step 3050-6.
Step 3050-6
The thermal fuse protects the SCSI bus from high currents due to shorts on the terminator, cable, or device. It is unlikely that the thermal fuse can be tripped by a defective adapter. A fault (short-circuit) causes an increase in resistance and temperature of the thermal fuse. The increase in temperature causes the thermal fuse to halt current flow. The thermal fuse returns to a low resistive and low temperature state when the fault is removed from the SCSI bus or when the system is turned off. Wait 10 seconds for the thermal fuse to reset itself and recover, then retest.
If the same error persists, replace the components of the failing SCSI bus in the following order. Wait 10 seconds for the thermal fuse to reset itself between steps.
1. Cable (if present)
2. DASD backplane (if present)
3. System board (if SCSI bus interface logic is on the system board)
To replace a component and verify that the problem was corrected, do the following:
1. Power off the system or logical partition.
2. Replace a component listed above.
3. Power on the system or logical partition, and run diagnostics in system verification mode on the adapter.
Did you correct the problem?
NO Go to Step 3050-7.
YES Go to Step 3050-16.
Step 3050-7
Determine if any of the disk arrays on the adapter are in a Degraded state as follows:
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select List PCI-X SCSI Disk Array Configuration.
3. Select the PCI-X SCSI RAID Controller identified in the hardware error log.
Does any disk array have a State of Degraded?
NO Go to Step 3050-9.
YES Go to Step 3050-8.
Step 3050-8
1. Identify the failed disk(s) by first finding disk arrays with a state of Degraded and then a pdisk for that disk array which has a state of Failed.
2. Remove the failed disk from each degraded disk array by using the “Common Device Removal and Installation Procedure” on page 72.
3. Run diagnostics in system verification mode on the adapter.
Did a SCSI bus related failure occur?
NO Go to Step 3050-16.
YES Go to Step 3050-9.
Step 3050-9
Are there any non-essential removable media devices (for example tape, CD-ROM, or DVD-ROM) on the SCSI bus?
NO Go to Step 3050-12.
YES Go to Step 3050-10.
Step 3050-10
1. Remove one of the non-essential removable media devices by using the “Common Device Removal
and Installation Procedure” on page 72.
2. Run diagnostics in system verification mode on the adapter.
Did a SCSI bus related failure occur?
NO Go to Step 3050-11.
YES Go to Step 3050-9.
Step 3050-11
The last removable media device removed from the SCSI bus may be the cause of the SCSI bus problems. Follow the repair procedures for that device.
Go to Step 3050-16.
Step 3050-12
Are there any non-essential disks which are not disk array members (for example, 512 byte/sector standalone disks, hot spare disks, or Array Candidates) on the SCSI bus?
NO Go to Step 3050-15.
YES Go to Step 3050-13.
Step 3050-13
1. Remove one of the non-essential disks devices by using the “Common Device Removal and Installation Procedure” on page 72.
2. Run diagnostics in system verification mode on the adapter.
Did a SCSI bus related failure occur?
NO Go to Step 3050-14.
YES Go to Step 3050-12.
Step 3050-14
The last disk removed from the SCSI bus may be the cause of the SCSI bus problems. Follow the repair procedures for that device.
Go to Step 3050-16.
Step 3050-15
Contact your service support organization.
Exit this procedure
Step 3050-16
1. Reinstall any good devices that were removed by using the “Common Device Removal and Installation Procedure” on page 72.
2. Run diagnostics in system verification mode on the adapter.
3. Perform the appropriate problem determination steps to resolve any other non-SCSI bus related errors that appear, if any, such as for degraded disk arrays.
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3051
Use the following to determine what other FRUs, besides a disk, may need to be replaced in order to solve a problem.
You are sent here when a pdisk (that is physical disk in 522 bytes/sector format) was identified as the primary FRU to replace in order to solve a problem. However, if replacing the disk did not resolve the problem then other FRUs may need to be replaced.
Considerations:
1) Remove power from the system before connecting and disconnecting cables or devices, as appropriate, to prevent hardware damage or erroneous diagnostic results.
2) Keep in mind that some systems have SCSI and PCI-X bus interface logic integrated onto the system boards and use a pluggable RAID Enablement Card (a non-PCI form factor card) for these SCSI/PCI-X
busses. An example of such a RAID Enablement Card is FC 5709. For these configurations, replacement of the RAID Enablement Card is unlikely to solve a SCSI bus related problem since the SCSI bus interface logic is on the system board.
3) Some adapters provide two connectors, one internal and one external, for each SCSI bus. For this type of adapter, it is not acceptable to use both connectors for the same SCSI bus at the same time. SCSI bus problems are likely to occur if this is done. However, it is acceptable to use an internal connector for one SCSI bus and an external connector for another SCSI bus. The internal and external connectors are labeled to indicate which SCSI bus they correspond to.
Attention: Replacing RAID adapters is not recommended without assistance from your service support organization when SCSI bus problems exist. Because the adapter may contain non-volatile write cache data and configuration data for the attached disk arrays, additional problems can be created by replacing a adapter when SCSI bus problems exist.
Attention: Removing functioning disks in a disk array is not recommended without assistance from your service support organization. A disk array may become degraded or failed if functioning disks are removed and additional problems may be created.
Step 3051-1
Identify the SCSI bus which the problem is occurring on by examining the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapter resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view. Viewing the hardware error log, under the Disk Information heading, the Location field can be used to identify which SCSI bus the error is associated with (that is
the SCSI bus is the value EF given a location of EF-G,H).
7. Go to Step 3051-2.
Step 3051-2
While the error persists, replace the components of the failing SCSI bus in the following order.
1. Cable (if present)
2. Adapter (if SCSI bus interface logic is on the adapter) or system board (if SCSI bus interface logic is on the system board)
3. DASD backplane (if present)
To replace a component and see if the problem was corrected, do the following:
1. Power off the system/logical partition
2. Replace a component listed above
3. Power on the system/logical partition, and run diagnostics in system verification mode on the adapter
When the problem is resolved then go to MAP 0410: Repair Checkout, in RS/6000 Eserver pSeries Diagnostic Information for Multiple Bus Systems.
MAP 3090
The problem that occurred is uncommon or complex to resolve. Information should be gathered and
assistance obtained from your service support organization.
Step 3090-1
Record the hardware error log. The hardware error log may be viewed as follows:
1. Start Diagnostics and select Task Selection on the Function Selection screen.
2. Select Display Hardware Error Report.
3. Select Display Hardware Errors for PCI-X SCSI RAID Adapters.
4. Select the adapter resource, or select all adapters resources if the adapter resource is not known.
5. On the Error Summary screen, look for an entry with a SRN corresponding to the problem which sent you here and select it.
Note: If multiple entries exist for the SRN, some entries could be older versions or a problem has occurred on multiple entities (such as adapters, disk arrays, and devices). Older entries can be ignored, however, this MAP may need to be used multiple times if the same problem has occurred on multiple entities.
6. Select the hardware error log to view.
7. Go to Step 3090-2.
Step 3090-2
Collect any hardware error logged about the same time for the adapter.
Go to Step 3090-3.
Step 3090-3
Collect the current disk array configuration. The disk array configuration may be viewed as follows:
1. Start the PCI-X SCSI Disk Array Manager.
a. Start Diagnostics and select Task Selection on the Function Selection screen.
b. Select RAID Array Manager.
c. Select PCI-X SCSI Disk Array Manager.
2. Select List PCI-X SCSI Disk Array Configuration.
3. Select the PCI-X SCSI RAID Controller identified in the hardware error log.
4. Go to Step 3090-4.
Step 3090-4
Contact your service support organization.
Exit this procedure.
Finding an SRN Given an AIX Error Log
Normally Error Log Analysis (ELA) will examine the error logs and present a Service Request Number (SRN) to the user as appropriate. If you need to determine an SRN given an AIX error log, perform the
following steps:
1. Display the error log using the AIX errpt command (for example errpt for a summary followed by
errpt -a -s timestamp or errpt -a -N resource_name).
2. Ensure that the Error ID is of the form SISIOA_xxxx (for example SISIOA_ARY_DEGRADED).
Note: Only Error IDs of the form SISIOA_xxxx are potentially related to disk arrays.
3. Locate the SENSE DATA in the Detail Data.
4. Identify the bytes 24-27 of the SENSE DATA from the 32 bytes shown.
Note: Use the following example AIX Error Log to help you identify bytes 24-27.
5. The first four digits of the SRN, known as the Failing Function Code (FFC), can be found in the
following table:
6. The second four digits of the SRN, known as the Reason Code is equal to bytes 26-27 of the SENSE DATA
For the example error log:
1) Bytes 24-27 of the SENSE DATA are 5703 9030
2) The first four digits of the SRN, using 5703 in the table above, is 2523
3) The second four digits of the SRN is 9030
4) Thus the SRN would be 2323 - 9030
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/28303/showart_2018510.html |
|