kwtip 发表于 2013-09-13 13:36

收explorer包机器重启。

机器是T5440跑的是oracle数据库    客户说硬盘读写有点慢,然后收explorer包分析 在收的过程中机器重起了,不赶在收了就发过来一个messages文件,大家来帮俺分析下。



我先贴点报错,详细的看附件。

Sep 13 11:14:29 DtOMCR genunix: NOTICE: SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
Sep 13 11:14:29 DtOMCR unix:
Sep 13 11:14:29 DtOMCR ^Mpanic/thread=2a10372fca0:
Sep 13 11:14:29 DtOMCR unix: Fatal error has occured in: PCIe fabric.(0x1)(0x41)
Sep 13 11:14:29 DtOMCR unix:
Sep 13 11:14:29 DtOMCR genunix: 000002a103797d50 px:px_err_panic+1ac (1947400, 135a400, 41, 2a103797e00, 1, 0)
Sep 13 11:14:29 DtOMCR genunix:    %l0-3: 0000000000000000 0000000001947400 0000000000000000 0000000000000001
Sep 13 11:14:29 DtOMCR   %l4-7: 0000000000000000 0000000001875c00 0000000000000001 0000000000000000
Sep 13 11:14:29 DtOMCR genunix: 000002a103797e60 px:px_err_intr+158 (4, 4, 1, 40, 1, 1947c00)
Sep 13 11:14:29 DtOMCR genunix:    %l0-3: 0000000000000000 0000000000004000 0000000000000000 00000300055eae40
Sep 13 11:14:29 DtOMCR   %l4-7: 0000000000000001 0000000000000041 0000030003d18938 0000000000000002
Sep 13 11:14:29 DtOMCR genunix: 000002a103797f50 unix:current_thread+164 (16, 58, ffffffffffffffff, 1, 100, 12)
Sep 13 11:14:29 DtOMCR genunix:    %l0-3: 0000000001009904 000002a10372efe1 000000000000000e 0000000070010500
Sep 13 11:14:29 DtOMCR   %l4-7: 0000000000000002 0000000000000010 0000000000000000 000002a10372f890
Sep 13 11:14:29 DtOMCR genunix: 000002a10372f930 unix:cpu_halt+104 (3000664e000, 58, 187c370, 187c240, 3000664e000, 0)
Sep 13 11:14:29 DtOMCR genunix:    %l0-3: 00000600348ef31c 0000000000000001 0000000000000016 0000000000000001
Sep 13 11:14:29 DtOMCR   %l4-7: 0000000000000000 0000000000000002 000003000664e178 0000000000000001
Sep 13 11:14:29 DtOMCR genunix: 000002a10372f9e0 unix:idle+128 (182a800, 0, 3000664e000, ffffffffffffffff, 59, 1829400)
Sep 13 11:14:29 DtOMCR genunix:    %l0-3: 00000600348ef2f8 000000000000001b 0000000000000000 ffffffffffffffff
Sep 13 11:14:29 DtOMCR   %l4-7: 00000600348ef2f8 ffffffffffffffff 000000000187c240 0000000001040970
Sep 13 11:14:29 DtOMCR unix:
Sep 13 11:14:29 DtOMCR genunix: syncing file systems...
Sep 13 11:14:32 DtOMCR genunix: 21
Sep 13 11:14:35 DtOMCR genunix: 11
Sep 13 11:15:39 DtOMCR last message repeated 20 times
Sep 13 11:15:40 DtOMCR genunix: done (not all i/o completed)
Sep 13 11:15:41 DtOMCR genunix: dumping to /dev/dsk/c1t0d0s1, offset 9437249536, content: kernel
Sep 13 11:20:24 DtOMCR genunix: ^M100% done: 431269 pages dumped, compression ratio 2.97,
Sep 13 11:20:24 DtOMCR genunix: dump succeeded




Sep 13 11:26:15 DtOMCR fmd: SUNW-MSG-ID: SUNOS-8000-FU, TYPE: Defect, VER: 1, SEVERITY: Major
Sep 13 11:26:15 DtOMCR EVENT-TIME: Fri Sep 13 11:26:14 CST 2013
Sep 13 11:26:15 DtOMCR PLATFORM: SUNW,Netra-T5440, CSN: -, HOSTNAME: DtOMCR
Sep 13 11:26:15 DtOMCR SOURCE: eft, REV: 1.16
Sep 13 11:26:15 DtOMCR EVENT-ID: db956dbc-317f-c039-ff6c-dff94d7d23d7
Sep 13 11:26:15 DtOMCR DESC: The diagnosis engine encountered telemetry for which it was unable to perform a diagnosis.Refer to http://sun.com/msg/SUNOS-8000-FU for more information.
Sep 13 11:26:15 DtOMCR AUTO-RESPONSE: Error reports have been logged for examination by Sun.
Sep 13 11:26:15 DtOMCR IMPACT: Automated diagnosis and response for these events will not occur.
Sep 13 11:26:15 DtOMCR REC-ACTION: Ensure that the latest Solaris Kernel and Predictive Self-Healing (PSH) patches are installed.


会是缺少什么补丁靠成的吗?

系统IO 内存CPU使用情况


znnnz 发表于 2013-09-13 13:38

SUNW-MSG-ID: SUNOS-8000-0G一般要么是PCI上插的卡坏了,要么就是IO板坏坏了。

ysmole 发表于 2013-09-13 13:48

印象中某个版本的solaris10在收集explorer的时候会有宕机的可能

kwtip 发表于 2013-09-13 13:49

回复 2# znnnz
哦,是吗这么肯定,官方说这个错误是硬件或软件补丁固件造成的,机器重起后也能起来没有问题,就是硬盘的读写有点慢。

   

znnnz 发表于 2013-09-13 15:12

回复 4# kwtip


    能起系统的话,iostat -En 、metastat等看看硬盘,如果是solaris10的话,fmadmfaulty看看。

kwtip 发表于 2013-09-13 15:26

fmdump -v

Sep 13 11:26:14.9918 db956dbc-317f-c039-ff6c-dff94d7d23d7 SUNOS-8000-FU
100%defect.sunos.eft.undiag.fme

      Problem in: -
         Affects: -
               FRU: -
          Location: -


fmadm faulty -a

--------------- -------------------------------------------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- -------------------------------------------------- ---------
Sep 13 11:26:14 db956dbc-317f-c039-ff6c-dff94d7d23d7SUNOS-8000-FUMajor   

Host      : DtOMCR
Platform    : SUNW,Netra-T5440        Chassis_id:

Fault class : defect.sunos.eft.undiag.fme

Description : The diagnosis engine encountered telemetry for which it was
            unable to perform a diagnosis.Refer to
            http://sun.com/msg/SUNOS-8000-FU for more information.

Response    : Error reports have been logged for examination by Sun.

Impact      : Automated diagnosis and response for these events will not occur.

Action      : Ensure that the latest Solaris Kernel and Predictive Self-Healing
            (PSH) patches are installed.


和日志里提示的一样。
页: [1]
查看完整版本: 收explorer包机器重启。