免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 6434 | 回复: 8
打印 上一主题 下一主题

[高级应用] 求救:两台做HA的55A宕机,各位老大帮帮忙看看倒底是什么原因 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2007-10-15 11:46 |只看该作者 |倒序浏览
各位老大:客户两台55A做了AIX5.3+HA5.3,10月3日两台机都宕掉了,10月8日重启后ha自动启动不成功,后来手动启动HA报一堆的错,下面是10月3日的两台机的svr1和svr2的errlog, 帮忙看看是怎么一回事?谢谢

svr1:
---------------------------------------------------------------------------
LABEL:                REBOOT_ID
IDENTIFIER:        2BFA76F6

Date/Time:       Wed Oct  3 09:21:05 2007
Sequence Number: 13168
Machine Id:      00001157D600
Node Id:         localhost
Class:           S
Type:            TEMP
Resource Name:   SYSPROC

Description
SYSTEM SHUTDOWN BY USER

Probable Causes
SYSTEM SHUTDOWN

Detail Data
USER ID
           0
0=SOFT IPL 1=HALT 2=TIME REBOOT
           1
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
           0
---------------------------------------------------------------------------
LABEL:                ERRLOG_ON
IDENTIFIER:        9DBCFDEE

Date/Time:       Mon Oct  8 16:06:49 2007
Sequence Number: 13167
Machine Id:      00001157D600
Node Id:         localhost
Class:           O
Type:            TEMP
Resource Name:   errdemon

Description
ERROR LOGGING TURNED ON

Probable Causes
ERRDEMON STARTED AUTOMATICALLY

User Causes
/USR/LIB/ERRDEMON COMMAND

        Recommended Actions
        NONE

---------------------------------------------------------------------------
LABEL:                OPMSG
IDENTIFIER:        AA8AB241

Date/Time:       Wed Oct  3 09:20:58 2007
Sequence Number: 13166
Machine Id:      00001157D600
Node Id:         srv1
Class:           O
Type:            TEMP
Resource Name:   OPERATOR

Description
OPERATOR NOTIFICATION

User Causes
ERRLOGGER COMMAND

        Recommended Actions
        REVIEW DETAILED DATA

Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
---------------------------------------------------------------------------
LABEL:                SRC_SVKO
IDENTIFIER:        BC3BE5A3

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 13165
Machine Id:      00001157D600
Node Id:         srv1
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        1024
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
clstrmgrES
---------------------------------------------------------------------------
LABEL:                SRC_RSTRT
IDENTIFIER:        BA431EB7

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 13164
Machine Id:      00001157D600
Node Id:         srv1
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
           0
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
emsvcs
---------------------------------------------------------------------------
LABEL:                SRC_SVKO
IDENTIFIER:        BC3BE5A3

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 13163
Machine Id:      00001157D600
Node Id:         srv1
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        1024
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
grpsvcs
---------------------------------------------------------------------------
LABEL:                HA002_ER
IDENTIFIER:        12081DC6

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 13162
Machine Id:      00001157D600
Node Id:         srv1
Class:           S
Type:            PERM
Resource Name:   haemd

Description
SOFTWARE PROGRAM ERROR

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

        Recommended Actions
        REPORT DETAILED DATA
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
DETECTING MODULE
LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.36,L#=1361,                                    
DIAGNOSTIC EXPLANATION
haemd: 2521-032 Cannot dispatch group services (1).

---------------------------------------------------------------------------
LABEL:                SRC_SVKO
IDENTIFIER:        BC3BE5A3

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 13161
Machine Id:      00001157D600
Node Id:         srv1
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
      393350
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
topsvcs
---------------------------------------------------------------------------
LABEL:                GS_TS_RETCODE_ER
IDENTIFIER:        64368504

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 13160
Machine Id:      00001157D600
Node Id:         srv1
Class:           O
Type:            PERM
Resource Name:   grpsvcs

Description
Connection failure between Group Services and Topology Services

Probable Causes
Topology Services daemon is not running
Topology Services daemon has died
Topology Services library has detected an error

Failure Causes
Group Services detects an error condition of Topology Services

        Recommended Actions
        Check the Topology Services daemon
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists

Detail Data
DETECTING MODULE
RSCT,PMClient.C,1.72,1049                     
ERROR ID
62IcBY/tti.5/PuJ0/6U08....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
topsvcs subsystem died with hb_errno = 16, grpsvcs will also exit.
---------------------------------------------------------------------------
LABEL:                CORE_DUMP
IDENTIFIER:        A63BEB70

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 13159
Machine Id:      00001157D600
Node Id:         srv1
Class:           S
Type:            PERM
Resource Name:   SYSPROC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

User Causes
USER GENERATED SIGNAL

        Recommended Actions
        CORRECT THEN RETRY

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
                368682
FILE SYSTEM SERIAL NUMBER
           4
INODE NUMBER
        4189
PROCESSOR ID
          -1
CORE FILE NAME
/var/ha/run/topsvcs.srv_cluster/core
PROGRAM NAME
hatsd
STACK EXECUTION DISABLED
           0
ADDITIONAL INFORMATION
pthread_k 7C
??

Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/hatsd SIG/6 FLDS/pthread_k VALU/7c
---------------------------------------------------------------------------
LABEL:                TS_THREAD_STUCK_ER
IDENTIFIER:        CFC7B6A9

Date/Time:       Wed Oct  3 09:20:55 2007
Sequence Number: 13158
Machine Id:      00001157D600
Node Id:         srv1
Class:           S
Type:            PERM
Resource Name:   topsvcs

Description
Main thread blocked: exiting

Probable Causes
Topology Services daemon's main thread has been blocked too long
Topology Services daemon cannot get timely access to CPU

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Failure Causes
A problem in the Topology Services daemon
A problem in a library invoked by the Topology Services daemon
Excessive virtual memory activity prevents Topology Services from making progress
Excessive disk I/O traffic is interfering with paging I/O

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Detail Data
DETECTING MODULE
rsct,Hb_Rsock.C,1.57,1660                     
ERROR ID
6dOvlD1rti.5/Q9C0/6U08....................
REFERENCE CODE
                                          
Number of seconds where main thread made no progress
         300
Number of page faults with disk I/O during period
           0
Interval in milliseconds where page faults occurred
       15996
---------------------------------------------------------------------------


svr2:
---------------------------------------------------------------------------
LABEL:                OPMSG
IDENTIFIER:        AA8AB241

Date/Time:       Wed Oct  3 09:20:58 2007
Sequence Number: 8257
Machine Id:      000010A1D600
Node Id:         srv2
Class:           O
Type:            TEMP
Resource Name:   OPERATOR

Description
OPERATOR NOTIFICATION

User Causes
ERRLOGGER COMMAND

        Recommended Actions
        REVIEW DETAILED DATA

Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
---------------------------------------------------------------------------
LABEL:                SRC_RSTRT
IDENTIFIER:        BA431EB7

Date/Time:       Wed Oct  3 09:20:58 2007
Sequence Number: 8256
Machine Id:      000010A1D600
Node Id:         srv2
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY

Detail Data
SYMPTOM CODE
           0
SOFTWARE ERROR CODE
       -9035
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
emsvcs
---------------------------------------------------------------------------
LABEL:                SRC_SVKO
IDENTIFIER:        BC3BE5A3

Date/Time:       Wed Oct  3 09:20:58 2007
Sequence Number: 8255
Machine Id:      000010A1D600
Node Id:         srv2
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        1024
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
clstrmgrES
---------------------------------------------------------------------------
LABEL:                SRC_SVKO
IDENTIFIER:        BC3BE5A3

Date/Time:       Wed Oct  3 09:20:58 2007
Sequence Number: 8254
Machine Id:      000010A1D600
Node Id:         srv2
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
        1024
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
grpsvcs
---------------------------------------------------------------------------
LABEL:                HA002_ER
IDENTIFIER:        12081DC6

Date/Time:       Wed Oct  3 09:20:58 2007
Sequence Number: 8253
Machine Id:      000010A1D600
Node Id:         srv2
Class:           S
Type:            PERM
Resource Name:   haemd

Description
SOFTWARE PROGRAM ERROR

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

        Recommended Actions
        REPORT DETAILED DATA
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
DETECTING MODULE
LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.36,L#=1361,                                    
DIAGNOSTIC EXPLANATION
haemd: 2521-032 Cannot dispatch group services (1).

---------------------------------------------------------------------------
LABEL:                SRC_SVKO
IDENTIFIER:        BC3BE5A3

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 8252
Machine Id:      000010A1D600
Node Id:         srv2
Class:           S
Type:            PERM
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
      393350
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
topsvcs
---------------------------------------------------------------------------
LABEL:                GS_TS_RETCODE_ER
IDENTIFIER:        64368504

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 8251
Machine Id:      000010A1D600
Node Id:         srv2
Class:           O
Type:            PERM
Resource Name:   grpsvcs

Description
Connection failure between Group Services and Topology Services

Probable Causes
Topology Services daemon is not running
Topology Services daemon has died
Topology Services library has detected an error

Failure Causes
Group Services detects an error condition of Topology Services

        Recommended Actions
        Check the Topology Services daemon
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists

Detail Data
DETECTING MODULE
RSCT,PMClient.C,1.72,1049                     
ERROR ID
62IcBY/tti.5/t45106U08....................
REFERENCE CODE
                                          
DIAGNOSTIC EXPLANATION
topsvcs subsystem died with hb_errno = 16, grpsvcs will also exit.
---------------------------------------------------------------------------
LABEL:                CORE_DUMP
IDENTIFIER:        A63BEB70

Date/Time:       Wed Oct  3 09:20:57 2007
Sequence Number: 8250
Machine Id:      000010A1D600
Node Id:         srv2
Class:           S
Type:            PERM
Resource Name:   SYSPROC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

User Causes
USER GENERATED SIGNAL

        Recommended Actions
        CORRECT THEN RETRY

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
           6
USER'S PROCESS ID:
                340000
FILE SYSTEM SERIAL NUMBER
           4
INODE NUMBER
        4117
PROCESSOR ID
          -1
CORE FILE NAME
/var/ha/run/topsvcs.srv_cluster/core
PROGRAM NAME
hatsd
STACK EXECUTION DISABLED
           0
ADDITIONAL INFORMATION
pthread_k 7C
??

Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/hatsd SIG/6 FLDS/pthread_k VALU/7c
---------------------------------------------------------------------------
LABEL:                TS_THREAD_STUCK_ER
IDENTIFIER:        CFC7B6A9

Date/Time:       Wed Oct  3 09:20:55 2007
Sequence Number: 8249
Machine Id:      000010A1D600
Node Id:         srv2
Class:           S
Type:            PERM
Resource Name:   topsvcs

Description
Main thread blocked: exiting

Probable Causes
Topology Services daemon's main thread has been blocked too long
Topology Services daemon cannot get timely access to CPU

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Failure Causes
A problem in the Topology Services daemon
A problem in a library invoked by the Topology Services daemon
Excessive virtual memory activity prevents Topology Services from making progress
Excessive disk I/O traffic is interfering with paging I/O

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Detail Data
DETECTING MODULE
rsct,Hb_Rsock.C,1.57,1660                     
ERROR ID
6dOvlD1rti.5/lll006U08....................
REFERENCE CODE
                                          
Number of seconds where main thread made no progress
         300
Number of page faults with disk I/O during period
           0
Interval in milliseconds where page faults occurred
        4498

论坛徽章:
0
2 [报告]
发表于 2007-10-15 13:22 |只看该作者
老大,光有看的没有说的?
看在党国的份上,拉兄弟一把吧

论坛徽章:
0
3 [报告]
发表于 2007-10-15 13:28 |只看该作者
................
tail -f /tmp/hacmp.out  贴出来

另外第一个错误日志说  SYSTEM SHUTDOWN BY USER   , 10月3号的 .  是人为down机的吗 ?

论坛徽章:
0
4 [报告]
发表于 2007-10-15 13:33 |只看该作者
是不是资源紧张,导致"Topology Services daemon's main thread has been blocked too long
Topology Services daemon cannot get timely access to CPU"?

论坛徽章:
1
2015年辞旧岁徽章
日期:2015-03-03 16:54:15
5 [报告]
发表于 2007-10-15 14:32 |只看该作者
install IY80103 to update rsct to 2.4.4.1

论坛徽章:
0
6 [报告]
发表于 2007-10-15 15:26 |只看该作者
把HA 日志帖出来看看吧!

论坛徽章:
0
7 [报告]
发表于 2007-10-17 10:54 |只看该作者
HA 可能不是正版

论坛徽章:
0
8 [报告]
发表于 2007-10-17 18:18 |只看该作者
呵呵,HA不是正版?搞笑
说的不详细,没法帮你。

论坛徽章:
0
9 [报告]
发表于 2007-10-17 23:43 |只看该作者
“Topology Services daemon's main thread has been blocked too long
Topology Services daemon cannot get timely access to CPU”
这个没关系,调个参数就可以,两台都宕机十有八九是补丁问题。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP