yzfs13 发表于 2011-10-09 13:29

富士通M5000遇到的问题,求各位帮助~~~~

富士通M5000,重启完毕以后发现机器有报警,查看背后IOU#1上面所有PCI卡的灯不亮,于是开始查找原因

1、先查看messages报:

Oct9 13:26:25 bzdb px: WARNING: px0: spurious interrupt from ino 0x16
Oct9 13:26:25 bzdb px: fjulsa-1#0
Oct9 13:26:25 bzdb px:
Oct9 13:26:27 bzdb px: WARNING: px0: spurious interrupt from ino 0x16
Oct9 13:26:27 bzdb px: fjulsa-1#0
Oct9 13:26:27 bzdb px:
Oct9 13:26:28 bzdb px: WARNING: px0: spurious interrupt from ino 0x16
Oct9 13:26:28 bzdb px: fjulsa-1#0
Oct9 13:26:28 bzdb px:

2、查看XSCF,做showlogs error提示:

Date: Oct 08 14:30:53 CST 2011   Code: 60002000-99020000-02001f0000000000
    Status: Warning                Occurred: Oct 08 14:30:52.732 CST 2011
    FRU: /MBU_B
    Msg: FMEM check sum error
Date: Oct 08 14:31:38 CST 2011   Code: 60004000-ffffffff-0109001500000000
    Status: Warning                Occurred: Oct 08 14:26:38.036 CST 2011
    FRU: /UNSPECIFIED,/UNSPECIFIED
    Msg: XSCF command: System status change (OS panic) (DID#00, path: 00)

fmdump以后出现:
Oct 08 14:30:53.5631 94df0db4-efdd-49a9-a9be-a637853923f7 SCF-8003-HA
Oct 08 14:31:38.1415 d4879368-3d4c-4463-86c8-457bfb92c8d5 SCF-8005-PX

XSCF> fmdump -v -u 94df0db4-efdd-49a9-a9be-a637853923f7                     
TIME               UUID                                 MSG-ID
Oct 08 14:30:53.5631 94df0db4-efdd-49a9-a9be-a637853923f7 SCF-8003-HA
100%fault.chassis.SPARC-Enterprise.asic.mbc.fe

      Problem in: hc:///chassis=0/cmu=1/mbc=1
         Affects: hc:///chassis=0/cmu=1/xsb=0
               FRU: hc://:product-id=SPARC Enterprise M5000:chassis-id=BCF0829008:server-id=bzdb:serial=BC082806A0:part=CF00541-0478 07   /541-0478-07:revision=0201/component=/MBU_B
          Location: /MBU_B

XSCF> fmdump -v -u d4879368-3d4c-4463-86c8-457bfb92c8d5
TIME               UUID                                 MSG-ID
Oct 08 14:31:38.1415 d4879368-3d4c-4463-86c8-457bfb92c8d5 SCF-8005-PX
100%upset.chassis.domain.panic

      Problem in: hc:///chassis=0/domain=0
         Affects: -
               FRU: hc://:product-id=SPARC Enterprise M5000:chassis-id=BCF0829008:server-id=bzdb/component=CHASSIS
          Location: CHASSIS

判断应该是PCI笼子发生故障,继续收集信息

XSCF> showhardconf
SPARC Enterprise M5000;
    + Serial:BCF0829008; Operator_Panel_Switch:Locked;
    + Power_Supply_System:Single; SCF-ID:XSCF#0;
    + System_Power:On; System_Phase:Cabinet Power On;
    Domain#0 Domain_Status:Running;

。。。。。

IOU#0 Status:Normal; Ver:0101h; Serial:BF0825EB0P;
      + FRU-Part-Number:CF00541-2240 03   /541-2240-03          ;
      DDC_A#0 Status:Normal;
      DDCR Status:Normal;
            DDC_B#0 Status:Normal;
      PCI#0 Name_Property:FJSV,ulsa; Card_Type:Other;
      PCI#1 Name_Property:pci; Card_Type:Other;
      PCI#2 Name_Property:fibre-channel; Card_Type:Other;
      PCI#3 Name_Property:pci; Card_Type:Other;
      PCI#4 Name_Property:fibre-channel; Card_Type:Other;
    IOU#1 Status:Normal; Ver:0101h; Serial:BF0825EAX9;
      + FRU-Part-Number:CF00541-2240 03   /541-2240-03          ;
      DDC_A#0 Status:Normal;
      DDCR Status:Normal;
            DDC_B#0 Status:Normal;

发现IOU#1的状态是Normal,比较困惑,如果是这个笼子坏了,状态怎么还是Normal呢,然后继续查

3、进入OK下probe-scsi-all,提示:
This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000

Base SAS World Wide ID is 0!

This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000

Base SAS World Wide ID is 0!

This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000

Base SAS World Wide ID is 0!

This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000

Base SAS World Wide ID is 0!

This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000

Base SAS World Wide ID is 0!

This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000

Base SAS World Wide ID is 0!

This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000

Base SAS World Wide ID is 0!

This must be fixed immediately using set-sas-wwid
MPT Firmware Fault, code 2000

现在有疑问是IO笼子坏了还是板子坏了,机器能够正常启动并使用,推断应该不是主板问题,但是IO笼子的话为什么状态还是normal呢,有点困惑,望各位高手忙帮忙,谢谢!!

yzfs13 发表于 2011-10-09 16:21

自己顶一下~~

dinky 发表于 2011-10-24 14:10

XSCF〉showstatus
是什么输出阿?

PCI0是什么卡?

财版 发表于 2011-10-24 14:49

MX000难搞啊:em27:

dinky 发表于 2011-10-24 14:54

M5000没有所谓的主板,都是模块化的组件.
报错里有FMEM check sum error
我建议你看看内存板是否有降级出现.因为Mx00里面CPU内存和IOU存在逻辑上的联系,因此CPU和内存的故障会导致对应的IOU不可用.

具体报错如下,建议检查
FRU: /MBU_B
Msg: FMEM check sum error

zhaopingzi 发表于 2012-01-06 14:02

楼主呢,问题咋样了

yzfs13 发表于 2012-01-10 13:27

啊,这个问题最后解决方法如下:

1、首先更换了IOU1的笼子,查看发现问题依旧,showstatus显示主板有deconfigure了,哈哈

2、更换主板,重启机器,问题解决

郁闷的,就重启了下系统,居然板子就o了........
页: [1]
查看完整版本: 富士通M5000遇到的问题,求各位帮助~~~~