- 论坛徽章:
- 0
|
富士通M5000,重启完毕以后发现机器有报警,查看背后IOU#1上面所有PCI卡的灯不亮,于是开始查找原因
1、先查看messages报:
Oct 9 13:26:25 bzdb px: [ID 781074 kern.warning] WARNING: px0: spurious interrupt from ino 0x16
Oct 9 13:26:25 bzdb px: [ID 548919 kern.info] fjulsa-1#0
Oct 9 13:26:25 bzdb px: [ID 100033 kern.info]
Oct 9 13:26:27 bzdb px: [ID 781074 kern.warning] WARNING: px0: spurious interrupt from ino 0x16
Oct 9 13:26:27 bzdb px: [ID 548919 kern.info] fjulsa-1#0
Oct 9 13:26:27 bzdb px: [ID 100033 kern.info]
Oct 9 13:26:28 bzdb px: [ID 781074 kern.warning] WARNING: px0: spurious interrupt from ino 0x16
Oct 9 13:26:28 bzdb px: [ID 548919 kern.info] fjulsa-1#0
Oct 9 13:26:28 bzdb px: [ID 100033 kern.info]
2、查看XSCF,做showlogs error提示:
Date: Oct 08 14:30:53 CST 2011 Code: 60002000-99020000-02001f0000000000
Status: Warning Occurred: Oct 08 14:30:52.732 CST 2011
FRU: /MBU_B
Msg: FMEM check sum error
Date: Oct 08 14:31:38 CST 2011 Code: 60004000-ffffffff-0109001500000000
Status: Warning Occurred: Oct 08 14:26:38.036 CST 2011
FRU: /UNSPECIFIED,/UNSPECIFIED
Msg: XSCF command: System status change (OS panic) (DID#00, path: 00)
fmdump以后出现:
Oct 08 14:30:53.5631 94df0db4-efdd-49a9-a9be-a637853923f7 SCF-8003-HA
Oct 08 14:31:38.1415 d4879368-3d4c-4463-86c8-457bfb92c8d5 SCF-8005-PX
XSCF> fmdump -v -u 94df0db4-efdd-49a9-a9be-a637853923f7
TIME UUID MSG-ID
Oct 08 14:30:53.5631 94df0db4-efdd-49a9-a9be-a637853923f7 SCF-8003-HA
100% fault.chassis.SPARC-Enterprise.asic.mbc.fe
Problem in: hc:///chassis=0/cmu=1/mbc=1
Affects: hc:///chassis=0/cmu=1/xsb=0
FRU: hc://:product-id=SPARC Enterprise M5000:chassis-id=BCF0829008:server-id=bzdb:serial=BC082806A0:part=CF00541-0478 07 /541-0478-07:revision=0201/component=/MBU_B
Location: /MBU_B
XSCF> fmdump -v -u d4879368-3d4c-4463-86c8-457bfb92c8d5
TIME UUID MSG-ID
Oct 08 14:31:38.1415 d4879368-3d4c-4463-86c8-457bfb92c8d5 SCF-8005-PX
100% upset.chassis.domain.panic
Problem in: hc:///chassis=0/domain=0
Affects: -
FRU: hc://:product-id=SPARC Enterprise M5000:chassis-id=BCF0829008:server-id=bzdb/component=CHASSIS
Location: CHASSIS
判断应该是PCI笼子发生故障,继续收集信息
XSCF> showhardconf
SPARC Enterprise M5000;
+ Serial:BCF0829008; Operator_Panel_Switch:Locked;
+ Power_Supply_System:Single; SCF-ID:XSCF#0;
+ System_Power:On; System_Phase:Cabinet Power On;
Domain#0 Domain_Status:Running;
。。。。。
IOU#0 Status:Normal; Ver:0101h; Serial:BF0825EB0P ;
+ FRU-Part-Number:CF00541-2240 03 /541-2240-03 ;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
PCI#0 Name_Property:FJSV,ulsa; Card_Type:Other;
PCI#1 Name_Property:pci; Card_Type:Other;
PCI#2 Name_Property:fibre-channel; Card_Type:Other;
PCI#3 Name_Property:pci; Card_Type:Other;
PCI#4 Name_Property:fibre-channel; Card_Type:Other;
IOU#1 Status:Normal; Ver:0101h; Serial:BF0825EAX9 ;
+ FRU-Part-Number:CF00541-2240 03 /541-2240-03 ;
DDC_A#0 Status:Normal;
DDCR Status:Normal;
DDC_B#0 Status:Normal;
发现IOU#1的状态是Normal,比较困惑,如果是这个笼子坏了,状态怎么还是Normal呢,然后继续查
3、进入OK下probe-scsi-all,提示:
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000
Base SAS World Wide ID is 0!
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000
Base SAS World Wide ID is 0!
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000
Base SAS World Wide ID is 0!
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000
Base SAS World Wide ID is 0!
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000
Base SAS World Wide ID is 0!
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000
Base SAS World Wide ID is 0!
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000
Base SAS World Wide ID is 0!
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000
现在有疑问是IO笼子坏了还是板子坏了,机器能够正常启动并使用,推断应该不是主板问题,但是IO笼子的话为什么状态还是normal呢,有点困惑,望各位高手忙帮忙,谢谢!! |
|