免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 19803 | 回复: 7

[求助] [已解决]HP RX8640异常重启,报INIT_INITIATED和IPMI Type-02 Event,请大家帮忙看一下 [复制链接]

论坛徽章:
0
发表于 2014-09-03 23:54 |显示全部楼层
本帖最后由 net_diy 于 2014-09-11 15:43 编辑

2台HP RX8640主机MC/SG双机,一个PKG,一节点发生重启,PKG切换到了2节点,系统syslog里没有任何告警,操作系统parstatus查看硬件均正常,MP卡里cm中ps查看硬件状态也都正常,但是在SEL中发现有Fatal级别的告警,日志如下:
[mp001b78d296c6] MP> sl

    EVENT LOG MENU:

        FPL: Forward Progress Log
        SEL: System Event Log
       LIVE: Live Events
       MPEL: MP Event Log

        CLR: Clear FPL and SEL
          Q: Quit

[mp001b78d296c6] MP:VW>  sel


             Welcome to the SEL (System Event Log) Viewer

   The following SEL navigation commands are available:
         D: Dump log starting at current block for capture and analysis
         F: Display first (oldest) block
         L: Display last (newest) block
         J: Jump to specified entry and display previous block
         +: Display next (forward in time) block
         -: Display previous (backward in time) block
      <cr>: Repeat previous +/- command
      <sp>: Repeat previous +/- command
         ?: Display help
        ^B: Exit viewer

   The following event format options are available:
         K: Keyword
         R: Raw hex
         T: Text

   The following event filter options are available:
         A: Alert level
         C: Cell
         U: Unfiltered
[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6] a

   Alert Level Filter:
     0: Minor Forward Progress
     1: Major Forward Progress
     2: Informational
     3: Warning
     5: Critical
     7: Fatal
     Q: Quit

For example, selecting an alert level threshold of 3
selects all events with alert levels of 3 or higher.

Please select alert level threshold:  3
Switching to alert level 3 filter.
[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
5376                215406a1bb020121 ff0f066f001f0300 IPMI Type-02 Event
5376                                                  09/03/2014 05:06:03
5373  HPUX 0,1,4 *3 7f80033914e0011b 00000000000aef00 HP-UX_DUMP_STATUS
5373                                                  09/03/2014 05:01:26
5372                215406a02502011a ff0f016f00200300 IPMI Type-02 Event
5372                                                  09/03/2014 04:59:17
5359  SFW  0,2,7 *7 f680007927e00101 0000000000000017 INIT_INITIATED
5359                                                  09/03/2014 04:59:15
5355  SFW  0,2,6 *7 f680007926e000fa 0000000000000016 INIT_INITIATED
5355                                                  09/03/2014 04:59:15
5346  SFW  0,2,5 *7 f680007925e000e9 0000000000000015 INIT_INITIATED
5346                                                  09/03/2014 04:59:14
5342  SFW  0,2,4 *7 f680007924e000e2 0000000000000014 INIT_INITIATED
5342                                                  09/03/2014 04:59:14
5337  SFW  0,2,3 *7 f680007923e000d9 0000000000000013 INIT_INITIATED
5337                                                  09/03/2014 04:59:14
5332  SFW  0,2,2 *7 f680007922e000d0 0000000000000012 INIT_INITIATED
5332                                                  09/03/2014 04:59:14
5327  SFW  0,2,1 *7 f680007921e000c7 0000000000000011 INIT_INITIATED
5327                                                  09/03/2014 04:59:14
5317  SFW  0,1,7 *7 f680007917e000b5 000000000000000f INIT_INITIATED
5317                                                  09/03/2014 04:59:14

[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
5315  SFW  0,2,0 *7 f680007920e000b1 0000000000000010 INIT_INITIATED
5315                                                  09/03/2014 04:59:14
5311  SFW  0,1,6 *7 f680007916e000aa 000000000000000e INIT_INITIATED
5311                                                  09/03/2014 04:59:13
5307  SFW  0,1,4 *7 f680007914e000a3 000000000000000c INIT_INITIATED
5307                                                  09/03/2014 04:59:13
5302  SFW  0,1,3 *7 f680007913e0009a 000000000000000b INIT_INITIATED
5302                                                  09/03/2014 04:59:13
5296  SFW  0,1,2 *7 f680007912e0008f 000000000000000a INIT_INITIATED
5296                                                  09/03/2014 04:59:13
5290  SFW  0,1,1 *7 f680007911e00084 0000000000000009 INIT_INITIATED
5290                                                  09/03/2014 04:59:13
5284  SFW  0,0,7 *7 f680007907e00079 0000000000000007 INIT_INITIATED
5284                                                  09/03/2014 04:59:13
5280  SFW  0,0,6 *7 f680007906e00072 0000000000000006 INIT_INITIATED
5280                                                  09/03/2014 04:59:13
5276  SFW  0,0,5 *7 f680007905e0006b 0000000000000005 INIT_INITIATED
5276                                                  09/03/2014 04:59:13
5271  SFW  0,0,4 *7 f680007904e00062 0000000000000004 INIT_INITIATED
5271                                                  09/03/2014 04:59:13
5265  SFW  0,1,0 *7 f680007910e00057 0000000000000008 INIT_INITIATED
5265                                                  09/03/2014 04:59:13

[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
5263  SFW  0,0,3 *7 f680007903e00054 0000000000000003 INIT_INITIATED
5263                                                  09/03/2014 04:59:13
5258  SFW  0,0,2 *7 f680007902e0004b 0000000000000002 INIT_INITIATED
5258                                                  09/03/2014 04:59:13
5252  SFW  0,0,1 *7 f680007901e00040 0000000000000001 INIT_INITIATED
5252                                                  09/03/2014 04:59:13
5245  SFW  0,0,0 *7 f680007900e00033 0000000000000000 INIT_INITIATED
5245                                                  09/03/2014 04:59:12
5240  SFW  0,1,5 *7 f680007915e0002a 000000000000000d INIT_INITIATED
5240                                                  09/03/2014 04:59:11
5218                2153a2f13d020005 ff0f066f001f0300 IPMI Type-02 Event
5218                                                  06/19/2014 14:18:37


Log Entry 5376:    09/03/2014 05:06:03
Keyword:  IPMI Type-02 Event
0x215406a1bb020121 0xff0f066f001f0300

Log Entry 5373:    09/03/2014 05:01:26
Alert level 3:  Warning
Keyword:  HP-UX_DUMP_STATUS
OS dump status  (EFxx)
Reporting Entity:  HP Unix located in cabinet 0, slot 1, cpu 4
Legacy PA HEX Code:  0xaef00
0x7f80033914e0011b 0x00000000000aef00
0x6b00033914e0011c 0x010000005406a0a6

Log Entry 5372:    09/03/2014 04:59:17
Keyword:  IPMI Type-02 Event
0x215406a02502011a 0xff0f016f00200300







[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5359:    09/03/2014 04:59:15
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 2, cpu 7
Implementation Dependent:  0x0000000000000017
0xf680007927e00101 0x0000000000000017
0xeb00007927e00102 0x010000005406a023

Log Entry 5355:    09/03/2014 04:59:15
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 2, cpu 6
Implementation Dependent:  0x0000000000000016
0xf680007926e000fa 0x0000000000000016
0xeb00007926e000fb 0x010000005406a023






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5346:    09/03/2014 04:59:14
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 2, cpu 5
Implementation Dependent:  0x0000000000000015
0xf680007925e000e9 0x0000000000000015
0xeb00007925e000ea 0x010000005406a022

Log Entry 5342:    09/03/2014 04:59:14
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 2, cpu 4
Implementation Dependent:  0x0000000000000014
0xf680007924e000e2 0x0000000000000014
0xeb00007924e000e3 0x010000005406a022






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5337:    09/03/2014 04:59:14
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 2, cpu 3
Implementation Dependent:  0x0000000000000013
0xf680007923e000d9 0x0000000000000013
0xeb00007923e000da 0x010000005406a022

Log Entry 5332:    09/03/2014 04:59:14
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 2, cpu 2
Implementation Dependent:  0x0000000000000012
0xf680007922e000d0 0x0000000000000012
0xeb00007922e000d1 0x010000005406a022






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5327:    09/03/2014 04:59:14
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 2, cpu 1
Implementation Dependent:  0x0000000000000011
0xf680007921e000c7 0x0000000000000011
0xeb00007921e000c8 0x010000005406a022

Log Entry 5317:    09/03/2014 04:59:14
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 1, cpu 7
Implementation Dependent:  0x000000000000000f
0xf680007917e000b5 0x000000000000000f
0xeb00007917e000b6 0x010000005406a022






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5315:    09/03/2014 04:59:14
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 2, cpu 0
Implementation Dependent:  0x0000000000000010
0xf680007920e000b1 0x0000000000000010
0xeb00007920e000b2 0x010000005406a022

Log Entry 5311:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 1, cpu 6
Implementation Dependent:  0x000000000000000e
0xf680007916e000aa 0x000000000000000e
0xeb00007916e000ab 0x010000005406a021






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5307:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 1, cpu 4
Implementation Dependent:  0x000000000000000c
0xf680007914e000a3 0x000000000000000c
0xeb00007914e000a4 0x010000005406a021

Log Entry 5302:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 1, cpu 3
Implementation Dependent:  0x000000000000000b
0xf680007913e0009a 0x000000000000000b
0xeb00007913e0009b 0x010000005406a021






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5296:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 1, cpu 2
Implementation Dependent:  0x000000000000000a
0xf680007912e0008f 0x000000000000000a
0xeb00007912e00090 0x010000005406a021

Log Entry 5290:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 1, cpu 1
Implementation Dependent:  0x0000000000000009
0xf680007911e00084 0x0000000000000009
0xeb00007911e00085 0x010000005406a021






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5284:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 0, cpu 7
Implementation Dependent:  0x0000000000000007
0xf680007907e00079 0x0000000000000007
0xeb00007907e0007a 0x010000005406a021

Log Entry 5280:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 0, cpu 6
Implementation Dependent:  0x0000000000000006
0xf680007906e00072 0x0000000000000006
0xeb00007906e00073 0x010000005406a021






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5276:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 0, cpu 5
Implementation Dependent:  0x0000000000000005
0xf680007905e0006b 0x0000000000000005
0xeb00007905e0006c 0x010000005406a021

Log Entry 5271:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 0, cpu 4
Implementation Dependent:  0x0000000000000004
0xf680007904e00062 0x0000000000000004
0xeb00007904e00063 0x010000005406a021






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5265:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 1, cpu 0
Implementation Dependent:  0x0000000000000008
0xf680007910e00057 0x0000000000000008
0xeb00007910e00058 0x010000005406a021

Log Entry 5263:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 0, cpu 3
Implementation Dependent:  0x0000000000000003
0xf680007903e00054 0x0000000000000003
0xeb00007903e00055 0x010000005406a021






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5258:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 0, cpu 2
Implementation Dependent:  0x0000000000000002
0xf680007902e0004b 0x0000000000000002
0xeb00007902e0004c 0x010000005406a021

Log Entry 5252:    09/03/2014 04:59:13
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 0, cpu 1
Implementation Dependent:  0x0000000000000001
0xf680007901e00040 0x0000000000000001
0xeb00007901e00041 0x010000005406a021






[mp001b78d296c6] MP:VWR (<cr>,<sp>,+,-,?,F,L,J,D,K,R,T,A,C,U,^B) > [mp001b78d296c6]  
Log Entry 5245:    09/03/2014 04:59:12
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 0, cpu 0
Implementation Dependent:  0x0000000000000000
0xf680007900e00033 0x0000000000000000
0xeb00007900e00034 0x010000005406a020

Log Entry 5240:    09/03/2014 04:59:11
Alert level 7:  Fatal
Keyword:  INIT_INITIATED
INIT initiated
Reporting Entity:  System Firmware located in cabinet 0, slot 1, cpu 5
Implementation Dependent:  0x000000000000000d
0xf680007915e0002a 0x000000000000000d
0xeb00007915e0002b 0x010000005406a01f

Log Entry 5218:    06/19/2014 14:18:37
Keyword:  IPMI Type-02 Event
0x2153a2f13d020005 0xff0f066f001f0300


好像看不出什么问题,不知道有没有必要升级微码来判断问题
System Firmware Revision:   9.066
BMC Revision:   v04.02

论坛徽章:
0
发表于 2014-09-04 12:06 |显示全部楼层
呵呵,没人回复啊,大家都没遇到过这样的问题啊,我个人感觉还是操作系统,或者是应用软件引起的系统异常重启,并且产生了dump,主机硬件本身应该没有太大问题,因为机器能够正常启动,从操作系统日志和MP卡日志和状态来看都没有硬件的错误产生,之前google搜到一个superdome上类似的告警日志,是因为ORACLE RAC的CRS异常造成系统重启的

论坛徽章:
0
发表于 2014-09-04 17:27 |显示全部楼层
eyword             = INIT_INITIATED   

Description:

This is the equivalent of a TOC event in the PA RISC Architecture. On IPF systems, this event is called an INIT.

This event can be triggered by the "tc" command from the MP, or from the button labeled "TOC" or "Transfer of Control" on the Management card or bezel of the system. There are also other causes of an INIT generated by software.



Data: Local CPU Number

Cause / Action:

Software has requested an INIT or the INIT button has been pressed.

Recommendation:

No action is required.

___________________________________________________________

Alert Level         = 7  - Fatal                  
Data Type           = 22 - Implementation dependant                    
Data                = 00 00 00 00 00 00 00 00
_______________________________________________________________________________

论坛徽章:
0
发表于 2014-09-04 17:29 |显示全部楼层
MP上的日志decode出来没有有用信息,
建议:
1.查看/var/tombstones下是否有MCA文件产生;
2./var/opt/resmon/log下的event文件中是否有告警;
3.使用crashinfo分析产生的dump文件.

评分

参与人数 1可用积分 +8 收起 理由
lbseraph + 8 谢谢积极回复~

查看全部评分

论坛徽章:
48
15-16赛季CBA联赛之青岛
日期:2021-01-07 13:41:2315-16赛季CBA联赛之上海
日期:2020-12-01 18:02:0720周年集字徽章-20	
日期:2020-10-28 14:14:2620周年集字徽章-20	
日期:2020-10-28 14:04:3015-16赛季CBA联赛之天津
日期:2020-10-18 22:51:412016猴年福章徽章
日期:2016-02-18 15:30:3415-16赛季CBA联赛之北控
日期:2015-12-22 13:30:48操作系统版块每日发帖之星
日期:2015-12-07 06:20:00操作系统版块每日发帖之星
日期:2015-09-04 06:20:002015亚冠之德黑兰石油
日期:2015-08-05 18:46:082015年亚洲杯之巴勒斯坦
日期:2015-04-19 10:42:502015年亚洲杯之巴林
日期:2015-04-09 08:03:23
发表于 2014-09-04 22:24 |显示全部楼层
这个没有MCA文件生成吧?没MCA的话先看dump,就目前描述来看很大可能是MCSG触发的重启。之后就要结合看cluster的日志了。

论坛徽章:
0
发表于 2014-09-07 22:17 |显示全部楼层
处理过一个由于内存耗尽导致MC一台机不停重启的案例。

评分

参与人数 1可用积分 +5 收起 理由
lbseraph + 5 谢谢积极回复~

查看全部评分

论坛徽章:
0
发表于 2014-09-09 12:32 |显示全部楼层
多谢各位的热心解答,明天要去客户现场处理该问题,总结了以下5条要进行的步骤,大家看有没有需要补充的:
1.系统关机日志
cat /var/adm/shutdownlog

2.系统dump,使用crashinfo分析产生的dump文件
/var/adm/crash


3.MC/SG日志
/etc/cmcluster/<pkgname>/<pkgname>.cntl

4.查看/var/tombstones下是否有MCA文件产生


5./var/opt/resmon/log下的event文件中是否有告警;

论坛徽章:
0
发表于 2014-09-11 15:01 |显示全部楼层
故障原因已经查明了,确实像lbseraph所说的是由MC/SG双机触发的TOC,到客户现场后首先查看了/var/adm/shutdownlog,看到如下内容
14:52  Wed Sep 03 2014.  Reboot after panic: SafetyTimer expired, INIT, IIP:0xe000000000547e00 IFA:0xe00000026ffffd00
之后对此次异常重启的crash进行crashinfo分析,在生产的文本文档里发现了如下内容:
linkstamp:          Fri Jan 06 00:19:30 EAT 2012
_release_version:   @(#) $Revision: vmunix:    B11.23_LR FLAVOR=perf Fri Aug 29 22:35:38 PDT 2003 $

sync'ing disks (2 buffers to flush): 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2Dead gateway detection can't ping the last remaining default gateway at 0xc6c96535. See ndd -h ip_ire_gw_probe for more info
2 2 2 2 2 2 2 2 2 2MC/ServiceGuard: Unable to maintain contact with cmcld daemon.
Performing TOC to ensure data integrity.
Spinlock timeout failure:  
The spinlock code has NOT failed! Instead, some spinlock
using code has failed to release a spinlock soon enough.
FAILING LOCK = tcp_lists_lock
Lock Address: 0xe0000001ac8b5800X ;  Owners mpinfo 0xe000000101b78000X ;  ticket 0x945682
serving 0x945680
had been waiting for ticket 0x945681
Milliseconds spent spinning =60001
Millseconds/sec = 1000
确定是因为用户网络问题,主机无法和网关进行通信,造成MC/SG的cmcld无法通信,为了保护数据的一致性,触发了TOC,谢谢大家的帮助,问题原因找到了就好,这下可以给用户解释清楚了,呵呵

评分

参与人数 1可用积分 +10 收起 理由
lbseraph + 10 谢谢分享经验~

查看全部评分

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP