免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4489 | 回复: 3
打印 上一主题 下一主题

[小机硬件] IBM p55A无故当机,请大家救命 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-11-17 09:49 |只看该作者 |倒序浏览
11月15日23.00,由两台P55a组成的的HA双机热备,备机IBM-P55a无故当机,以下是errpt信息,请大虾们帮忙诊断一下,问题究竟出在什么地方了,已经不是第一次出现这个问题得了,9月15日同样出现个这个问题,errpt信息一样。谢谢帮忙,不胜感激!!!
备机的HA无任何报错信息!!!!
errtp
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
AFA89905   1116084908 I O grpsvcs        Group Services daemon started
97419D60   1116084908 I O topsvcs        Topology Services daemon started
A6DF45AA   1116084308 I O RMCdaemon      The daemon is started.
2BFA76F6   1116084208 T S SYSPROC        SYSTEM SHUTDOWN BY USER
9DBCFDEE   1116084308 T O errdemon       ERROR LOGGING TURNED ON
6D19271E   1115230008 I O topsvcs        Topology Services daemon stopped
AA8AB241   1115230008 T O OPERATOR       OPERATOR NOTIFICATION
BA431EB7   1115230008 P S SRC            SOFTWARE PROGRAM ERROR
BC3BE5A3   1115230008 P S SRC            SOFTWARE PROGRAM ERROR
BC3BE5A3   1115230008 P S SRC            SOFTWARE PROGRAM ERROR
12081DC6   1115230008 P S haemd          SOFTWARE PROGRAM ERROR
9DEC29E1   1115230008 P O grpsvcs        Group Services daemon exit to merge doma
173C787F   1104000408 I S topsvcs        Possible malfunction on local adapter

errpt -a
---------------------------------------------------------------------------
LABEL:          GS_START_ST
IDENTIFIER:     AFA89905

Date/Time:       Sun Nov 16 08:49:03 BEIST 2008
Sequence Number: 411
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            INFO
Resource Name:   grpsvcs         

Description
Group Services daemon started

Probable Causes
Daemon started during system startup
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user

User Causes
Daemon started manually by user

    Recommended Actions
    Check that Group Services daemon is running

Detail Data
DETECTING MODULE
RSCT,pgsd.C,1.62.1.8,606                     
ERROR ID
63Y7ej0zrq57/g8z00...8....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
HAGS daemon started by SRC. Log file is /var/ha/log/grpsvcs_trace_2_19.
---------------------------------------------------------------------------
LABEL:          TS_START_ST
IDENTIFIER:     97419D60

Date/Time:       Sun Nov 16 08:49:00 BEIST 2008
Sequence Number: 410
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            INFO
Resource Name:   topsvcs         

Description
Topology Services daemon started

Probable Causes
Daemon started during system start-up
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user

User Causes
Daemon started manually by user

    Recommended Actions
    Confirm that this is desirable

Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.211,4459                    
ERROR ID
6UpNEL0wrq57/oBg/0...8....................
REFERENCE CODE
                                          
Topology Services daemon started by:
SRC
Topology Services daemon log file location
/var/ha/log/topsvcs.16.084900.btpclus.en_US
Topology Services daemon run directory
/var/ha/run/topsvcs.btpclus/
---------------------------------------------------------------------------
LABEL:          RMCD_INFO_0_ST
IDENTIFIER:     A6DF45AA

Date/Time:       Sun Nov 16 08:43:48 BEIST 2008
Sequence Number: 409
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            INFO
Resource Name:   RMCdaemon      

Description
The daemon is started.

Probable Causes
The Resource Monitoring and Control daemon has been started.

User Causes
The startsrc -s ctrmc command has been executed or
the rmcctrl -s command has been executed.

    Recommended Actions
    Confirm that the daemon should be started.

Detail Data
DETECTING MODULE
RSCT,rmcd.c,1.52,211                          
ERROR ID
6eKora02nq57/2.V.0...8....................
REFERENCE CODE
                                          
---------------------------------------------------------------------------
LABEL:          REBOOT_ID
IDENTIFIER:     2BFA76F6

Date/Time:       Sun Nov 16 08:42:15 BEIST 2008
Sequence Number: 407
Machine Id:      0006B665D600
Node Id:         localhost
Class:           S
Type:            TEMP
Resource Name:   SYSPROC         

Description
SYSTEM SHUTDOWN BY USER

Probable Causes
SYSTEM SHUTDOWN

Detail Data
USER ID
           0
0=SOFT IPL 1=HALT 2=TIME REBOOT
           1
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
           0
---------------------------------------------------------------------------
LABEL:          ERRLOG_ON
IDENTIFIER:     9DBCFDEE

Date/Time:       Sun Nov 16 08:43:29 BEIST 2008
Sequence Number: 406
Machine Id:      0006B665D600
Node Id:         localhost
Class:           O
Type:            TEMP
Resource Name:   errdemon        

Description
ERROR LOGGING TURNED ON

Probable Causes
ERRDEMON STARTED AUTOMATICALLY

User Causes
/USR/LIB/ERRDEMON COMMAND

    Recommended Actions
    NONE

---------------------------------------------------------------------------
LABEL:          TS_STOP_ST
IDENTIFIER:     6D19271E

Date/Time:       Sat Nov 15 23:00:42 BEIST 2008
Sequence Number: 405
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            INFO
Resource Name:   topsvcs         

Description
Topology Services daemon stopped

Probable Causes
Daemon stopped by SRC
Daemon stopped by signal

User Causes
Daemon stopped by user

    Recommended Actions
    Confirm that this is desirable

Detail Data
DETECTING MODULE
rsct,comm.C,1.147,634                        
ERROR ID
6SQG4h/OEi57/Kp8/0...8....................
REFERENCE CODE
6UpNEL0LSaq6/9Ui00...8....................
Topology Services daemon stopped by:
Signal SIGTERM
---------------------------------------------------------------------------
LABEL:          OPMSG
IDENTIFIER:     AA8AB241

Date/Time:       Sat Nov 15 23:00:41 BEIST 2008
Sequence Number: 404
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            TEMP
Resource Name:   OPERATOR        

Description
OPERATOR NOTIFICATION

User Causes
ERRLOGGER COMMAND

    Recommended Actions
    REVIEW DETAILED DATA

Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
---------------------------------------------------------------------------
LABEL:          SRC_RSTRT
IDENTIFIER:     BA431EB7

Date/Time:       Sat Nov 15 23:00:41 BEIST 2008
Sequence Number: 403
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

    Recommended Actions
    VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
           0
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
emsvcs
---------------------------------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Sat Nov 15 23:00:41 BEIST 2008
Sequence Number: 402
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

    Recommended Actions
    MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        2560
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
grpsvcs
---------------------------------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Sat Nov 15 23:00:41 BEIST 2008
Sequence Number: 401
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

    Recommended Actions
    MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        1024
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
clstrmgrES
---------------------------------------------------------------------------
LABEL:          HA002_ER
IDENTIFIER:     12081DC6

Date/Time:       Sat Nov 15 23:00:41 BEIST 2008
Sequence Number: 400
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            PERM
Resource Name:   haemd           

Description
SOFTWARE PROGRAM ERROR

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

    Recommended Actions
    REPORT DETAILED DATA
    CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
DETECTING MODULE
LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.37,L#=1395,                                    
DIAGNOSTIC EXPLANATION
haemd: 2521-032 Cannot dispatch group services (1).

---------------------------------------------------------------------------
LABEL:          GS_DOM_MERGE_ER
IDENTIFIER:     9DEC29E1

Date/Time:       Sat Nov 15 23:00:40 BEIST 2008
Sequence Number: 399
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            PERM
Resource Name:   grpsvcs         

Description
Group Services daemon exit to merge domains

Probable Causes
Network between two node groups has repaired

Failure Causes
Network communication has been blocked.
Topology Services has been partitioned.

    Recommended Actions
    Check the network connection.
Check the Topology Services.
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists

Detail Data
DETECTING MODULE
RSCT,NS.C,1.107.1.42,4484                     
ERROR ID
6Vb0vR0MEi57/i8T10...8....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
The master requests to dissolve my domain because of the merge with other domain 1.45
---------------------------------------------------------------------------
LABEL:          TS_LOC_DOWN_ST
IDENTIFIER:     173C787F

Date/Time:       Tue Nov  4 00:04:42 BEIST 2008
Sequence Number: 398
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            INFO
Resource Name:   topsvcs         

Description
Possible malfunction on local adapter

Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured

Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured

    Recommended Actions
    Verify adapter configuration
    Verify network connectivity

Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.21,4325            
ERROR ID
6zV5DL.O2m17/PYk/0...8....................
REFERENCE CODE
                                          
Adapter interface name
tty1
Adapter offset
           2
Adapter IP address
255.255.0.1

[ 本帖最后由 popoq 于 2008-11-17 10:59 编辑 ]

论坛徽章:
0
2 [报告]
发表于 2008-11-17 10:12 |只看该作者
Topology Services has been partitioned.
HA有问题了,直接就当掉主机。
检查一下网络+心跳

论坛徽章:
0
3 [报告]
发表于 2008-11-17 10:50 |只看该作者
clexit.rc : Unexpected termination of clstrmgrES
看看这个Probable Causes
Network between two node groups has repaired

Failure Causes
Network communication has been blocked.
Topology Services has been partitioned.
看看hacmp.out,查查网络

论坛徽章:
0
4 [报告]
发表于 2008-11-17 11:23 |只看该作者
网络问题吧
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP