- 论坛徽章:
- 0
|
各位老大:客户两台55A做了AIX5.3+HA5.3,10月3日两台机都宕掉了,10月8日重启后ha自动启动不成功,后来手动启动HA报一堆的错,下面是10月3日的两台机的svr1和svr2的errlog, 帮忙看看是怎么一回事?谢谢
svr1:
---------------------------------------------------------------------------
LABEL: REBOOT_ID
IDENTIFIER: 2BFA76F6
Date/Time: Wed Oct 3 09:21:05 2007
Sequence Number: 13168
Machine Id: 00001157D600
Node Id: localhost
Class: S
Type: TEMP
Resource Name: SYSPROC
Description
SYSTEM SHUTDOWN BY USER
Probable Causes
SYSTEM SHUTDOWN
Detail Data
USER ID
0
0=SOFT IPL 1=HALT 2=TIME REBOOT
1
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
0
---------------------------------------------------------------------------
LABEL: ERRLOG_ON
IDENTIFIER: 9DBCFDEE
Date/Time: Mon Oct 8 16:06:49 2007
Sequence Number: 13167
Machine Id: 00001157D600
Node Id: localhost
Class: O
Type: TEMP
Resource Name: errdemon
Description
ERROR LOGGING TURNED ON
Probable Causes
ERRDEMON STARTED AUTOMATICALLY
User Causes
/USR/LIB/ERRDEMON COMMAND
Recommended Actions
NONE
---------------------------------------------------------------------------
LABEL: OPMSG
IDENTIFIER: AA8AB241
Date/Time: Wed Oct 3 09:20:58 2007
Sequence Number: 13166
Machine Id: 00001157D600
Node Id: srv1
Class: O
Type: TEMP
Resource Name: OPERATOR
Description
OPERATOR NOTIFICATION
User Causes
ERRLOGGER COMMAND
Recommended Actions
REVIEW DETAILED DATA
Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
---------------------------------------------------------------------------
LABEL: SRC_SVKO
IDENTIFIER: BC3BE5A3
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 13165
Machine Id: 00001157D600
Node Id: srv1
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
MANUALLY RESTART SUBSYSTEM IF NEEDED
Detail Data
SYMPTOM CODE
1024
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
clstrmgrES
---------------------------------------------------------------------------
LABEL: SRC_RSTRT
IDENTIFIER: BA431EB7
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 13164
Machine Id: 00001157D600
Node Id: srv1
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY
Detail Data
SYMPTOM CODE
0
SOFTWARE ERROR CODE
-9035
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
emsvcs
---------------------------------------------------------------------------
LABEL: SRC_SVKO
IDENTIFIER: BC3BE5A3
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 13163
Machine Id: 00001157D600
Node Id: srv1
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
MANUALLY RESTART SUBSYSTEM IF NEEDED
Detail Data
SYMPTOM CODE
1024
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
grpsvcs
---------------------------------------------------------------------------
LABEL: HA002_ER
IDENTIFIER: 12081DC6
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 13162
Machine Id: 00001157D600
Node Id: srv1
Class: S
Type: PERM
Resource Name: haemd
Description
SOFTWARE PROGRAM ERROR
Probable Causes
SUBSYSTEM
Failure Causes
SUBSYSTEM
Recommended Actions
REPORT DETAILED DATA
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
DETECTING MODULE
LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.36,L#=1361,
DIAGNOSTIC EXPLANATION
haemd: 2521-032 Cannot dispatch group services (1).
---------------------------------------------------------------------------
LABEL: SRC_SVKO
IDENTIFIER: BC3BE5A3
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 13161
Machine Id: 00001157D600
Node Id: srv1
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
MANUALLY RESTART SUBSYSTEM IF NEEDED
Detail Data
SYMPTOM CODE
393350
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
topsvcs
---------------------------------------------------------------------------
LABEL: GS_TS_RETCODE_ER
IDENTIFIER: 64368504
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 13160
Machine Id: 00001157D600
Node Id: srv1
Class: O
Type: PERM
Resource Name: grpsvcs
Description
Connection failure between Group Services and Topology Services
Probable Causes
Topology Services daemon is not running
Topology Services daemon has died
Topology Services library has detected an error
Failure Causes
Group Services detects an error condition of Topology Services
Recommended Actions
Check the Topology Services daemon
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
RSCT,PMClient.C,1.72,1049
ERROR ID
62IcBY/tti.5/PuJ0/6U08....................
REFERENCE CODE
DIAGNOSTIC EXPLANATION
topsvcs subsystem died with hb_errno = 16, grpsvcs will also exit.
---------------------------------------------------------------------------
LABEL: CORE_DUMP
IDENTIFIER: A63BEB70
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 13159
Machine Id: 00001157D600
Node Id: srv1
Class: S
Type: PERM
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
368682
FILE SYSTEM SERIAL NUMBER
4
INODE NUMBER
4189
PROCESSOR ID
-1
CORE FILE NAME
/var/ha/run/topsvcs.srv_cluster/core
PROGRAM NAME
hatsd
STACK EXECUTION DISABLED
0
ADDITIONAL INFORMATION
pthread_k 7C
??
Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/hatsd SIG/6 FLDS/pthread_k VALU/7c
---------------------------------------------------------------------------
LABEL: TS_THREAD_STUCK_ER
IDENTIFIER: CFC7B6A9
Date/Time: Wed Oct 3 09:20:55 2007
Sequence Number: 13158
Machine Id: 00001157D600
Node Id: srv1
Class: S
Type: PERM
Resource Name: topsvcs
Description
Main thread blocked: exiting
Probable Causes
Topology Services daemon's main thread has been blocked too long
Topology Services daemon cannot get timely access to CPU
User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Failure Causes
A problem in the Topology Services daemon
A problem in a library invoked by the Topology Services daemon
Excessive virtual memory activity prevents Topology Services from making progress
Excessive disk I/O traffic is interfering with paging I/O
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
rsct,Hb_Rsock.C,1.57,1660
ERROR ID
6dOvlD1rti.5/Q9C0/6U08....................
REFERENCE CODE
Number of seconds where main thread made no progress
300
Number of page faults with disk I/O during period
0
Interval in milliseconds where page faults occurred
15996
---------------------------------------------------------------------------
svr2:
---------------------------------------------------------------------------
LABEL: OPMSG
IDENTIFIER: AA8AB241
Date/Time: Wed Oct 3 09:20:58 2007
Sequence Number: 8257
Machine Id: 000010A1D600
Node Id: srv2
Class: O
Type: TEMP
Resource Name: OPERATOR
Description
OPERATOR NOTIFICATION
User Causes
ERRLOGGER COMMAND
Recommended Actions
REVIEW DETAILED DATA
Detail Data
MESSAGE FROM ERRLOGGER COMMAND
clexit.rc : Unexpected termination of clstrmgrES
---------------------------------------------------------------------------
LABEL: SRC_RSTRT
IDENTIFIER: BA431EB7
Date/Time: Wed Oct 3 09:20:58 2007
Sequence Number: 8256
Machine Id: 000010A1D600
Node Id: srv2
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
VERIFY SUBSYSTEM RESTARTED AUTOMATICALLY
Detail Data
SYMPTOM CODE
0
SOFTWARE ERROR CODE
-9035
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'217'
FAILING MODULE
emsvcs
---------------------------------------------------------------------------
LABEL: SRC_SVKO
IDENTIFIER: BC3BE5A3
Date/Time: Wed Oct 3 09:20:58 2007
Sequence Number: 8255
Machine Id: 000010A1D600
Node Id: srv2
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
MANUALLY RESTART SUBSYSTEM IF NEEDED
Detail Data
SYMPTOM CODE
1024
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
clstrmgrES
---------------------------------------------------------------------------
LABEL: SRC_SVKO
IDENTIFIER: BC3BE5A3
Date/Time: Wed Oct 3 09:20:58 2007
Sequence Number: 8254
Machine Id: 000010A1D600
Node Id: srv2
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
MANUALLY RESTART SUBSYSTEM IF NEEDED
Detail Data
SYMPTOM CODE
1024
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
grpsvcs
---------------------------------------------------------------------------
LABEL: HA002_ER
IDENTIFIER: 12081DC6
Date/Time: Wed Oct 3 09:20:58 2007
Sequence Number: 8253
Machine Id: 000010A1D600
Node Id: srv2
Class: S
Type: PERM
Resource Name: haemd
Description
SOFTWARE PROGRAM ERROR
Probable Causes
SUBSYSTEM
Failure Causes
SUBSYSTEM
Recommended Actions
REPORT DETAILED DATA
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
DETECTING MODULE
LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.36,L#=1361,
DIAGNOSTIC EXPLANATION
haemd: 2521-032 Cannot dispatch group services (1).
---------------------------------------------------------------------------
LABEL: SRC_SVKO
IDENTIFIER: BC3BE5A3
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 8252
Machine Id: 000010A1D600
Node Id: srv2
Class: S
Type: PERM
Resource Name: SRC
Description
SOFTWARE PROGRAM ERROR
Probable Causes
APPLICATION PROGRAM
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
MANUALLY RESTART SUBSYSTEM IF NEEDED
Detail Data
SYMPTOM CODE
393350
SOFTWARE ERROR CODE
-9017
ERROR CODE
0
DETECTING MODULE
'srchevn.c'@line:'350'
FAILING MODULE
topsvcs
---------------------------------------------------------------------------
LABEL: GS_TS_RETCODE_ER
IDENTIFIER: 64368504
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 8251
Machine Id: 000010A1D600
Node Id: srv2
Class: O
Type: PERM
Resource Name: grpsvcs
Description
Connection failure between Group Services and Topology Services
Probable Causes
Topology Services daemon is not running
Topology Services daemon has died
Topology Services library has detected an error
Failure Causes
Group Services detects an error condition of Topology Services
Recommended Actions
Check the Topology Services daemon
Verify that Group Services daemon has been restarted
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
RSCT,PMClient.C,1.72,1049
ERROR ID
62IcBY/tti.5/t45106U08....................
REFERENCE CODE
DIAGNOSTIC EXPLANATION
topsvcs subsystem died with hb_errno = 16, grpsvcs will also exit.
---------------------------------------------------------------------------
LABEL: CORE_DUMP
IDENTIFIER: A63BEB70
Date/Time: Wed Oct 3 09:20:57 2007
Sequence Number: 8250
Machine Id: 000010A1D600
Node Id: srv2
Class: S
Type: PERM
Resource Name: SYSPROC
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SOFTWARE PROGRAM
User Causes
USER GENERATED SIGNAL
Recommended Actions
CORRECT THEN RETRY
Failure Causes
SOFTWARE PROGRAM
Recommended Actions
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE
Detail Data
SIGNAL NUMBER
6
USER'S PROCESS ID:
340000
FILE SYSTEM SERIAL NUMBER
4
INODE NUMBER
4117
PROCESSOR ID
-1
CORE FILE NAME
/var/ha/run/topsvcs.srv_cluster/core
PROGRAM NAME
hatsd
STACK EXECUTION DISABLED
0
ADDITIONAL INFORMATION
pthread_k 7C
??
Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/hatsd SIG/6 FLDS/pthread_k VALU/7c
---------------------------------------------------------------------------
LABEL: TS_THREAD_STUCK_ER
IDENTIFIER: CFC7B6A9
Date/Time: Wed Oct 3 09:20:55 2007
Sequence Number: 8249
Machine Id: 000010A1D600
Node Id: srv2
Class: S
Type: PERM
Resource Name: topsvcs
Description
Main thread blocked: exiting
Probable Causes
Topology Services daemon's main thread has been blocked too long
Topology Services daemon cannot get timely access to CPU
User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Failure Causes
A problem in the Topology Services daemon
A problem in a library invoked by the Topology Services daemon
Excessive virtual memory activity prevents Topology Services from making progress
Excessive disk I/O traffic is interfering with paging I/O
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
rsct,Hb_Rsock.C,1.57,1660
ERROR ID
6dOvlD1rti.5/lll006U08....................
REFERENCE CODE
Number of seconds where main thread made no progress
300
Number of page faults with disk I/O during period
0
Interval in milliseconds where page faults occurred
4498 |
|