免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4167 | 回复: 4
打印 上一主题 下一主题

[小机硬件] ibmP55A又当机了。。。。。帮忙丫!!!:( [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-12-27 14:20 |只看该作者 |倒序浏览
12月17日00.17,由两台P55a组成的的HA双机热备的备机又当机了,之前11月15日也试过当机,请大虾们帮忙诊断一下,究竟当机的原因是不是同一个,问题究竟出在什么地方了,这已经第三次出现这个问题得了,9月15日同样出现个这个问题,谢谢帮忙,不胜感激!!!
备机的HA无任何报错信息!!!!
附:上次当机信息http://bbs.chinaunix.net/thread-1313204-1-1.html
以下十本次当机errpt -a信息。
---------------------------------------------------------------------------
LABEL:          GS_START_ST
IDENTIFIER:     AFA89905

Date/Time:       Sat Dec 27 11:10:43 BEIST 2008
Sequence Number: 424
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            INFO
Resource Name:   grpsvcs         

Description
Group Services daemon started

Probable Causes
Daemon started during system startup
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user

User Causes
Daemon started manually by user

        Recommended Actions
        Check that Group Services daemon is running

Detail Data
DETECTING MODULE
RSCT,pgsd.C,1.62.1.8,606                     
ERROR ID
63Y7ej0nmNJ7/4J800...8....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
HAGS daemon started by SRC. Log file is /var/ha/log/grpsvcs_trace_2_20.
---------------------------------------------------------------------------
LABEL:          TS_START_ST
IDENTIFIER:     97419D60

Date/Time:       Sat Dec 27 11:10:39 BEIST 2008
Sequence Number: 423
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            INFO
Resource Name:   topsvcs         

Description
Topology Services daemon started

Probable Causes
Daemon started during system start-up
Daemon re-started automatically by SRC
Daemon started during installation
Daemon started manually by user

User Causes
Daemon started manually by user

        Recommended Actions
        Confirm that this is desirable

Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.211,4459                    
ERROR ID
6UpNEL0jmNJ7/r26.0...8....................
REFERENCE CODE
                                          
Topology Services daemon started by:
SRC
Topology Services daemon log file location
/var/ha/log/topsvcs.27.111038.btpclus.en_US
Topology Services daemon run directory
/var/ha/run/topsvcs.btpclus/
---------------------------------------------------------------------------
LABEL:          RMCD_INFO_0_ST
IDENTIFIER:     A6DF45AA

Date/Time:       Sat Dec 27 10:55:55 BEIST 2008
Sequence Number: 422
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            INFO
Resource Name:   RMCdaemon      

Description
The daemon is started.

Probable Causes
The Resource Monitoring and Control daemon has been started.

User Causes
The startsrc -s ctrmc command has been executed or
the rmcctrl -s command has been executed.

        Recommended Actions
        Confirm that the daemon should be started.

Detail Data
DETECTING MODULE
RSCT,rmcd.c,1.52,211                          
ERROR ID
6eKora0vYNJ7/mZ4/0...8....................
REFERENCE CODE
                                          
---------------------------------------------------------------------------
LABEL:          REBOOT_ID
IDENTIFIER:     2BFA76F6

Date/Time:       Sat Dec 27 10:54:22 BEIST 2008
Sequence Number: 420
Machine Id:      0006B665D600
Node Id:         localhost
Class:           S
Type:            TEMP
Resource Name:   SYSPROC         

Description
SYSTEM SHUTDOWN BY USER

Probable Causes
SYSTEM SHUTDOWN

Detail Data
USER ID
           0
0=SOFT IPL 1=HALT 2=TIME REBOOT
           1
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
           0
---------------------------------------------------------------------------
LABEL:          ERRLOG_ON
IDENTIFIER:     9DBCFDEE

Date/Time:       Sat Dec 27 10:55:36 BEIST 2008
Sequence Number: 419
Machine Id:      0006B665D600
Node Id:         localhost
Class:           O
Type:            TEMP
Resource Name:   errdemon        

Description
ERROR LOGGING TURNED ON

Probable Causes
ERRDEMON STARTED AUTOMATICALLY

User Causes
/USR/LIB/ERRDEMON COMMAND

        Recommended Actions
        NONE

---------------------------------------------------------------------------
LABEL:          TS_STOP_ST
IDENTIFIER:     6D19271E

Date/Time:       Sat Dec 27 00:17:34 BEIST 2008
Sequence Number: 418
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            INFO
Resource Name:   topsvcs         

Description
Topology Services daemon stopped

Probable Causes
Daemon stopped by SRC
Daemon stopped by signal

User Causes
Daemon stopped by user

        Recommended Actions
        Confirm that this is desirable

Detail Data
DETECTING MODULE
rsct,comm.C,1.147,634                        
ERROR ID
6SQG4h/SCEJ7/cVe00...8....................
REFERENCE CODE
6UpNEL0wrq57/oBg/0...8....................
Topology Services daemon stopped by:
Signal SIGTERM
---------------------------------------------------------------------------
LABEL:          OPMSG
IDENTIFIER:     AA8AB241

Date/Time:       Sat Dec 27 00:17:33 BEIST 2008
Sequence Number: 417
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            TEMP
Resource Name:   OPERATOR        

Description
OPERATOR NOTIFICATION

User Causes
ERRLOGGER COMMAND

        Recommended Actions
        REVIEW DETAILED DATA

Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
---------------------------------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Sat Dec 27 00:17:33 BEIST 2008
Sequence Number: 416
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        1024
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
clstrmgrES
---------------------------------------------------------------------------
LABEL:          SRC_RSTRT
IDENTIFIER:     BA431EB7

Date/Time:       Sat Dec 27 00:17:33 BEIST 2008
Sequence Number: 415
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
           0
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
emsvcs
---------------------------------------------------------------------------
LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Sat Dec 27 00:17:33 BEIST 2008
Sequence Number: 414
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            PERM
Resource Name:   SRC            

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        3840
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
grpsvcs
---------------------------------------------------------------------------
LABEL:          HA002_ER
IDENTIFIER:     12081DC6

Date/Time:       Sat Dec 27 00:17:33 BEIST 2008
Sequence Number: 413
Machine Id:      0006B665D600
Node Id:         p55b
Class:           S
Type:            PERM
Resource Name:   haemd           

Description
SOFTWARE PROGRAM ERROR

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

        Recommended Actions
        REPORT DETAILED DATA
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
DETECTING MODULE
LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.37,L#=1395,                                    
DIAGNOSTIC EXPLANATION
haemd: 2521-032 Cannot dispatch group services (1).

---------------------------------------------------------------------------
LABEL:          GS_XSTALE_PRCLM_ER
IDENTIFIER:     657D8FFA

Date/Time:       Sat Dec 27 00:17:32 BEIST 2008
Sequence Number: 412
Machine Id:      0006B665D600
Node Id:         p55b
Class:           O
Type:            PERM
Resource Name:   grpsvcs         

Description
Group Services daemon exit to re-join the domain

Probable Causes
Topology Services daemon reports inconsistent node down and up events

Failure Causes
Network has been a temporal problem

        Recommended Actions
        Verify that Group Services daemon has been restarted
Call IBM Service if problem persists

Detail Data
DETECTING MODULE
RSCT,NsMsg.C,1.77,1070                        
ERROR ID
6uzMTZ/QCEJ7/QZf10...8....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
Got a non-stale Proclaim message from my NS(domId=1.45). He must have deleted me, so I'm exiting to

[ 本帖最后由 popoq 于 2008-12-27 14:22 编辑 ]

论坛徽章:
0
2 [报告]
发表于 2008-12-27 15:29 |只看该作者
无解,抱歉
只能看出系统down是由于clstrmgrES异常终止,而终止原因是由于group services异常(inconsistent node down and up)......
建议做法:停下HA,重新检查配置(只需要最基本的配置就行了),同步;HA打补丁,同步HA重启在看(打补丁之前做好备份---系统的和HA的snapshot);
或者做个snap找IBM,当然要在保范围内。

论坛徽章:
0
3 [报告]
发表于 2008-12-27 16:18 |只看该作者
路过哦!

论坛徽章:
0
4 [报告]
发表于 2008-12-27 16:53 |只看该作者
Network has been a temporal problem

论坛徽章:
0
5 [报告]
发表于 2008-12-27 16:55 |只看该作者
或者说应用问题导致的?
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP