Chinaunix

标题: P630宕机了,大家帮忙找一下原因。 [打印本页]

作者: tony8201    时间: 2005-07-13 17:41
标题: P630宕机了,大家帮忙找一下原因。
11号晚上11点a机宕机,应用自动切换到b机上,管理员不知道,没有马上启动a机。12号晚上7点b机宕机,应用停止,管理员发现后把a机和b机启动,到现在一切正常。不明白为什么会宕机???\r\n\r\na机错误信息:\r\n#errpt\r\nIDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION\r\n2F3E09A4   0713144505 I H ent2           REPAIR ACTION\r\nB6048838   0713112205 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\nB6048838   0713111905 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\nBA431EB7   0712201705 P S SRC            SOFTWARE PROGRAM ERROR\r\nB6048838   0712201705 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\nBA431EB7   0712201505 P S SRC            SOFTWARE PROGRAM ERROR\r\nB6048838   0712201505 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\nAFA89905   0712201205 I O grpsvcs        Group Services daemon started\r\n97419D60   0712201205 I O topsvcs        Topology Services daemon started\r\nA6DF45AA   0712200705 I O RMCdaemon      The daemon is started.\r\nBFE4C025   0712200505 P H sysplanar0     UNDETERMINED ERROR\r\n2BFA76F6   0711234905 T S SYSPROC        SYSTEM SHUTDOWN BY USER\r\n9DBCFDEE   0712200705 T O errdemon       ERROR LOGGING TURNED ON\r\nFE2DEE00   0711234905 P S SYSXAIXIF      DUPLICATE IP ADDRESS DETECTED IN THE NET\r\nFE2DEE00   0711234905 P S SYSXAIXIF      DUPLICATE IP ADDRESS DETECTED IN THE NET\r\nAA8AB241   0711234905 T O OPERATOR       OPERATOR NOTIFICATION\r\nBC3BE5A3   0711234905 P S SRC            SOFTWARE PROGRAM ERROR\r\nB6048838   0624162405 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\nB6048838   0624161705 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\nB6048838   0624160905 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\nD5385D18   0505224805 T H hdisk2         ARRAY OPERATION ERROR\r\n3C81E43F   0427114005 P U topsvcs        Late in sending heartbeat
作者: tony8201    时间: 2005-07-13 17:42
标题: P630宕机了,大家帮忙找一下原因。
#errpt -a\r\nLABEL:                REBOOT_ID\r\nIDENTIFIER:        2BFA76F6\r\n\r\nDate/Time:       Mon Jul 11 23:49:50 BEIS\r\nSequence Number: 180\r\nMachine Id:      00570B1E4C00\r\nNode Id:         localhost\r\nClass:           S\r\nType:            TEMP\r\nResource Name:   SYSPROC         \r\n\r\nDescription\r\nSYSTEM SHUTDOWN BY USER\r\n\r\nProbable Causes\r\nSYSTEM SHUTDOWN\r\n\r\nDetail Data\r\nUSER ID\r\n           0\r\n0=SOFT IPL 1=HALT 2=TIME REBOOT\r\n           1\r\nTIME TO REBOOT (FOR TIMED REBOOT ONLY)\r\n           0\r\n---------------------------------------------------------------------------\r\nLABEL:                ERRLOG_ON\r\nIDENTIFIER:        9DBCFDEE\r\n\r\nDate/Time:       Tue Jul 12 20:07:17 BEIS\r\nSequence Number: 179\r\nMachine Id:      00570B1E4C00\r\nNode Id:         localhost\r\nClass:           O\r\nType:            TEMP\r\nResource Name:   errdemon        \r\n\r\nDescription\r\nERROR LOGGING TURNED ON\r\n\r\nProbable Causes\r\nERRDEMON STARTED AUTOMATICALLY\r\n\r\nUser Causes\r\n/USR/LIB/ERRDEMON COMMAND\r\n\r\n        Recommended Actions\r\n        NONE\r\n\r\n---------------------------------------------------------------------------\r\nLABEL:                AIXIF_ARP_DUP_ADDR\r\nIDENTIFIER:        FE2DEE00\r\n\r\nDate/Time:       Mon Jul 11 23:49:45 BEIS\r\nSequence Number: 178\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSXAIXIF       \r\n\r\nDescription\r\nDUPLICATE IP ADDRESS DETECTED IN THE NET\r\n\r\nFailure Causes\r\nARP RESPONSE RECEIVED FOR MY IP ADDRESS\r\n\r\n        Recommended Actions\r\n        CONTACT NETWORK ADMINISTRATOR\r\n\r\nDetail Data\r\nDUPLICATE IP ADDRESS\r\n0A67 0103 \r\nMAC ADDRESS\r\n000D 600B 8DE2 \r\n---------------------------------------------------------------------------\r\nLABEL:                AIXIF_ARP_DUP_ADDR\r\nIDENTIFIER:        FE2DEE00\r\n\r\nDate/Time:       Mon Jul 11 23:49:44 BEIS\r\nSequence Number: 177\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSXAIXIF       \r\n\r\nDescription\r\nDUPLICATE IP ADDRESS DETECTED IN THE NET\r\n\r\nFailure Causes\r\nARP RESPONSE RECEIVED FOR MY IP ADDRESS\r\n\r\n        Recommended Actions\r\n        CONTACT NETWORK ADMINISTRATOR\r\n\r\nDetail Data\r\nDUPLICATE IP ADDRESS\r\n0A67 0103 \r\nMAC ADDRESS\r\n000D 600B 8DE2 \r\n---------------------------------------------------------------------------\r\nLABEL:                OPMSG\r\nIDENTIFIER:        AA8AB241\r\n\r\nDate/Time:       Mon Jul 11 23:49:43 BEIS\r\nSequence Number: 176\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           O\r\nType:            TEMP\r\nResource Name:   OPERATOR        \r\n\r\nDescription\r\nOPERATOR NOTIFICATION\r\n\r\nUser Causes\r\nERRLOGGER COMMAND\r\n\r\n        Recommended Actions\r\n        REVIEW DETAILED DATA\r\n\r\nDetail Data\r\nMESSAGE FROM ERRLOGGER COMMAND\r\nclexit.rc : Unexpected termination of clstrmgrES\r\n---------------------------------------------------------------------------\r\nLABEL:                SRC_SVKO\r\nIDENTIFIER:        BC3BE5A3\r\n\r\nDate/Time:       Mon Jul 11 23:49:42 BEIS\r\nSequence Number: 175\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SRC             \r\n\r\nDescription\r\nSOFTWARE PROGRAM ERROR\r\n\r\nProbable Causes\r\nAPPLICATION PROGRAM\r\n\r\nFailure Causes\r\nSOFTWARE PROGRAM\r\n\r\n        Recommended Actions\r\n        MANUALLY RESTART SUBSYSTEM IF NEEDED\r\n\r\nDetail Data\r\nSYMPTOM CODE\r\n         512\r\nSOFTWARE ERROR CODE\r\n       -9017\r\nERROR CODE\r\n           0\r\nDETECTING MODULE\r\n\'srchevn.c\'@line:\'350\'\r\nFAILING MODULE\r\nclstrmgrES\r\n---------------------------------------------------------------------------\r\nLABEL:                CORE_DUMP\r\nIDENTIFIER:        B6048838\r\n\r\nDate/Time:       Fri Jun 24 16:24:34 BEIS\r\nSequence Number: 174\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSPROC         \r\n\r\nDescription\r\nSOFTWARE PROGRAM ABNORMALLY TERMINATED\r\n\r\nProbable Causes\r\nSOFTWARE PROGRAM\r\n\r\nUser Causes\r\nUSER GENERATED SIGNAL\r\n\r\n        Recommended Actions\r\n        CORRECT THEN RETRY\r\n\r\nFailure Causes\r\nSOFTWARE PROGRAM\r\n\r\n        Recommended Actions\r\n        RERUN THE APPLICATION PROGRAM\r\n        IF PROBLEM PERSISTS THEN DO THE FOLLOWING\r\n        CONTACT APPROPRIATE SERVICE REPRESENTATIVE\r\n\r\nDetail Data\r\nSIGNAL NUMBER\r\n          11\r\nUSER\'S PROCESS ID:\r\n       16234\r\nFILE SYSTEM SERIAL NUMBER\r\n          11\r\nINODE NUMBER\r\n        4098\r\nPROCESSOR ID\r\n           0\r\nCORE FILE NAME\r\n/xbstation/xb/server/bin/core\r\nPROGRAM NAME\r\nDept_Dispose\r\nADDITIONAL INFORMATION\r\n??\r\n??\r\nUnable to generate symptom string.\r\n---------------------------------------------------------------------------\r\nLABEL:                CORE_DUMP\r\nIDENTIFIER:        B6048838\r\n\r\nDate/Time:       Fri Jun 24 16:17:42 BEIS\r\nSequence Number: 173\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSPROC         \r\n\r\nDescription\r\nSOFTWARE PROGRAM ABNORMALLY TERMINATED\r\n\r\nProbable Causes\r\nSOFTWARE PROGRAM\r\n\r\nUser Causes\r\nUSER GENERATED SIGNAL\r\n\r\n        Recommended Actions\r\n        CORRECT THEN RETRY\r\n\r\nFailure Causes\r\nSOFTWARE PROGRAM\r\n\r\n        Recommended Actions\r\n        RERUN THE APPLICATION PROGRAM\r\n        IF PROBLEM PERSISTS THEN DO THE FOLLOWING\r\n        CONTACT APPROPRIATE SERVICE REPRESENTATIVE\r\n\r\nDetail Data\r\nSIGNAL NUMBER\r\n          11\r\nUSER\'S PROCESS ID:\r\n       11916\r\nFILE SYSTEM SERIAL NUMBER\r\n          11\r\nINODE NUMBER\r\n        4098\r\nPROCESSOR ID\r\n           1\r\nCORE FILE NAME\r\n/xbstation/xb/server/bin/core\r\nPROGRAM NAME\r\nDept_Dispose\r\nADDITIONAL INFORMATION\r\n_ptrgl 0\r\n??\r\nUnable to generate symptom string.\r\n---------------------------------------------------------------------------\r\nLABEL:                CORE_DUMP\r\nIDENTIFIER:        B6048838\r\n\r\nDate/Time:       Fri Jun 24 16:09:35 BEIS\r\nSequence Number: 172\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSPROC         \r\n\r\nDescription\r\nSOFTWARE PROGRAM ABNORMALLY TERMINATED\r\n\r\nProbable Causes\r\nSOFTWARE PROGRAM\r\n\r\nUser Causes\r\nUSER GENERATED SIGNAL\r\n\r\n        Recommended Actions\r\n        CORRECT THEN RETRY\r\n\r\nFailure Causes\r\nSOFTWARE PROGRAM\r\n\r\n        Recommended Actions\r\n        RERUN THE APPLICATION PROGRAM\r\n        IF PROBLEM PERSISTS THEN DO THE FOLLOWING\r\n        CONTACT APPROPRIATE SERVICE REPRESENTATIVE\r\n\r\nDetail Data\r\nSIGNAL NUMBER\r\n          11\r\nUSER\'S PROCESS ID:\r\n       58854\r\nFILE SYSTEM SERIAL NUMBER\r\n          11\r\nINODE NUMBER\r\n        4098\r\nPROCESSOR ID\r\n           1\r\nCORE FILE NAME\r\n/xbstation/xb/server/bin/core\r\nPROGRAM NAME\r\nDept_Dispose\r\nADDITIONAL INFORMATION\r\n??\r\n??\r\nUnable to generate symptom string.\r\n---------------------------------------------------------------------------\r\nLABEL:                FCP_ARRAY_ERR4\r\nIDENTIFIER:        D5385D18\r\n\r\nDate/Time:       Thu May  5 22:48:15 BEIS\r\nSequence Number: 163\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           H\r\nType:            TEMP\r\nResource Name:   hdisk2          \r\nResource Class:  disk\r\nResource Type:   array\r\nLocation:        U0.1-P2-I3/Q1-W200400A0B818048A-L0\r\n\r\nDescription\r\nARRAY OPERATION ERROR\r\n\r\nProbable Causes\r\nARRAY DASD MEDIA\r\nARRAY DASD DEVICE\r\n\r\nFailure Causes\r\nDASD MEDIA\r\nDISK DRIVE\r\n\r\n        Recommended Actions\r\n        PERFORM PROBLEM DETERMINATION PROCEDURES\r\n\r\nDetail Data\r\nSENSE DATA\r\n0A00 2E08 010E F888 0000 0804 0000 0000 0000 0000 0000 02E3 0200 0300 0000 0000 \r\n0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 \r\n0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 \r\n0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 \r\n0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 \r\n0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 \r\n0000 0000 3354 8000 F205 2101 0000 0000 0000 0000 0000 0000 EF00 64B0 0000 0000 \r\n0000 0000 \r\n---------------------------------------------------------------------------\r\nLABEL:                TS_LATEHB_PE\r\nIDENTIFIER:        3C81E43F\r\n\r\nDate/Time:       Wed Apr 27 11:40:58 BEIS\r\nSequence Number: 122\r\nMachine Id:      00570B1E4C00\r\nNode Id:         xbserver1\r\nClass:           U\r\nType:            PERF\r\nResource Name:   topsvcs         \r\nResource Class:  NONE\r\nResource Type:   NONE\r\nLocation:        \r\nVPD:             \r\n\r\nDescription\r\nLate in sending heartbeat\r\n\r\nProbable Causes\r\nHeavy CPU load\r\nSevere physical memory shortage\r\nHeavy I/O activities\r\n\r\nFailure Causes\r\nDaemon can not get required system resource\r\n\r\n        Recommended Actions\r\n        Reduce the system load\r\n\r\nDetail Data\r\nDETECTING MODULE\r\nrsct,bootstrp.C,1.184,4520                    \r\nERROR ID \r\n6zESUw.8bkP0/CRG.22kN8....................\r\nREFERENCE CODE\r\n                                          \r\nA heartbeat is late by the following number of seconds\r\n        3590
作者: tony8201    时间: 2005-07-13 17:45
标题: P630宕机了,大家帮忙找一下原因。
hacmp.out在此之前一直报这个错误:\r\n\r\n00:00000:00029:2005/07/11 11:45:58.25 kernel  Cannot read, host process disconnected:  1116 spid: 29\r\n00:00000:00046:2005/07/11 11:47:15.19 kernel  Cannot read, host process disconnected:  1672 spid: 46\r\n00:00000:00117:2005/07/11 13:11:36.77 kernel  Cannot read, host process disconnected:  1496 spid: 117\r\n00:00000:00161:2005/07/11 13:11:52.10 kernel  Cannot read, host process disconnected:  1120 spid: 161\r\n00:00000:00095:2005/07/11 13:12:32.84 kernel  Cannot read, host process disconnected:  984 spid: 95\r\n00:00000:00063:2005/07/11 13:12:33.85 kernel  Cannot read, host process disconnected:  192 spid: 63\r\n00:00000:00101:2005/07/11 13:13:09.28 kernel  Cannot read, host process disconnected:  1552 spid: 101\r\n00:00000:00138:2005/07/11 13:13:10.47 kernel  Cannot read, host process disconnected:  2012 spid: 138\r\n00:00000:00078:2005/07/11 13:35:21.32 kernel  Cannot read, host process disconnected:  3712 spid: 78\r\n00:00000:00157:2005/07/11 13:41:52.56 kernel  Cannot read, host process disconnected:  932 spid: 157\r\n00:00000:00100:2005/07/11 13:44:39.44 kernel  Cannot read, host process disconnected:  1064 spid: 100\r\n00:00000:00048:2005/07/11 13:45:02.54 kernel  Cannot read, host process disconnected:  636 spid: 48\r\n00:00000:00067:2005/07/11 13:48:46.76 kernel  Cannot read, host process disconnected:  1980 spid: 67\r\n00:00000:00014:2005/07/11 13:50:45.35 kernel  Cannot read, host process disconnected:  1056 spid: 14\r\n00:00000:00025:2005/07/11 13:52:13.68 kernel  Cannot read, host process disconnected:  244 spid: 25\r\n00:00000:00050:2005/07/11 13:52:15.68 kernel  Cannot read, host process disconnected:  480 spid: 50\r\n00:00000:00168:2005/07/11 14:27:16.55 kernel  Cannot read, host process disconnected: 506 1240 spid: 168\r\n00:00000:00160:2005/07/11 14:27:45.20 kernel  Cannot read, host process disconnected: 506 1916 spid: 160\r\n00:00000:00000:2005/07/11 14:38:04.42 kernel  nrpacket: recv, Connection timed out\r\n00:00000:00000:2005/07/11 14:58:59.43 kernel  nrpacket: recv, Connection timed out\r\n00:00000:00157:2005/07/11 15:35:29.66 kernel  Cannot read, host process disconnected: 107 1720 spid: 157\r\n00:00000:00019:2005/07/11 15:35:36.89 kernel  Cannot read, host process disconnected:  640 spid: 19\r\n00:00000:00057:2005/07/11 16:16:42.44 kernel  Cannot read, host process disconnected: 304 668 spid: 57\r\n00:00000:00078:2005/07/11 16:25:01.20 kernel  Cannot read, host process disconnected: 302 344 spid: 78\r\n00:00000:00104:2005/07/11 16:35:53.40 kernel  Cannot read, host process disconnected: 103 624 spid: 104\r\n00:00000:00168:2005/07/11 17:00:36.45 kernel  Cannot read, host process disconnected:  1104 spid: 168\r\n00:00000:00026:2005/07/11 17:00:57.26 kernel  Cannot read, host process disconnected:  1140 spid: 26\r\n00:00000:00066:2005/07/11 17:01:25.53 kernel  Cannot read, host process disconnected:  480 spid: 66\r\n00:00000:00132:2005/07/11 17:03:37.63 kernel  Cannot read, host process disconnected: 109 1764 spid: 132\r\n00:00000:00057:2005/07/11 17:19:44.78 kernel  Cannot read, host process disconnected:  400 spid: 57\r\n00:00000:00026:2005/07/11 17:20:20.33 kernel  Cannot read, host process disconnected:  1564 spid: 26\r\n00:00000:00073:2005/07/11 17:21:15.06 kernel  Cannot read, host process disconnected:  1428 spid: 73\r\n00:00000:00160:2005/07/11 17:21:30.87 kernel  Cannot read, host process disconnected:  1324 spid: 160\r\n00:00000:00069:2005/07/11 17:27:23.17 kernel  Cannot read, host process disconnected:  552 spid: 69\r\n00:00000:00095:2005/07/11 18:02:47.27 kernel  Cannot read, host process disconnected: 509 1144 spid: 95\r\n00:00000:00090:2005/07/11 18:32:32.59 kernel  Cannot read, host process disconnected: 309 548 spid: 90\r\n00:00000:00131:2005/07/11 18:51:44.69 kernel  Cannot read, host process disconnected: 302 664 spid: 131\r\n00:00000:00063:2005/07/11 18:52:40.59 kernel  Cannot read, host process disconnected:  1912 spid: 63\r\n00:00000:00097:2005/07/11 19:55:59.92 kernel  Cannot read, host process disconnected:  1104 spid: 97\r\n00:00000:00011:2005/07/11 21:35:05.08 kernel  Cannot read, host process disconnected:  1496 spid: 11
作者: tony8201    时间: 2005-07-13 17:54
标题: P630宕机了,大家帮忙找一下原因。
hacmp.out 2005/07/11 21:35:05.08之后没有任何记录直到12号晚上管理员发现。\r\n\r\n硬件检测没有问题,据管理员讲电源和网络没有问题。\r\n\r\nFE2DEE00   0711234905 P S SYSXAIXIF      DUPLICATE IP ADDRESS DETECTED IN THE NET\r\n\r\n这个错误应该是IP地址重了,不知道是不是这个原因?\r\n\r\n2BFA76F6   0711234905 T S SYSPROC        SYSTEM SHUTDOWN BY USER\r\n\r\n但是为什么会出现这个报告呢?这个期间谁也没有动过主机。\r\n\r\n以上是a 机出现的问题。
作者: tony8201    时间: 2005-07-13 17:57
标题: P630宕机了,大家帮忙找一下原因。
b机出现错误:\r\n\r\n#errpt\r\n2BFA76F6   0712190805 T S SYSPROC        SYSTEM SHUTDOWN BY USER\r\n9DBCFDEE   0712200405 T O errdemon       ERROR LOGGING TURNED ON\r\nAA8AB241   0712190805 T O OPERATOR       OPERATOR NOTIFICATION\r\nBC3BE5A3   0712190805 P S SRC            SOFTWARE PROGRAM ERROR\r\nB6048838   0712190805 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\n64368504   0711235205 P O grpsvcs        Connection failure between Group Service\r\n173C787F   0711235105 I S topsvcs        Possible malfunction on local adapter\r\nFE2DEE00   0711235105 P S SYSXAIXIF      DUPLICATE IP ADDRESS DETECTED IN THE NET\r\nFE2DEE00   0711235105 P S SYSXAIXIF      DUPLICATE IP ADDRESS DETECTED IN THE NET
作者: tony8201    时间: 2005-07-13 17:59
标题: P630宕机了,大家帮忙找一下原因。
#errpt -a\r\n\r\n---------------------------------------------------------------------------\r\nLABEL:                REBOOT_ID\r\nIDENTIFIER:        2BFA76F6\r\n\r\nDate/Time:       Tue Jul 12 19:08:51 BEIS\r\nSequence Number: 178\r\nMachine Id:      00570A9E4C00\r\nNode Id:         localhost\r\nClass:           S\r\nType:            TEMP\r\nResource Name:   SYSPROC         \r\n\r\nDescription\r\nSYSTEM SHUTDOWN BY USER\r\n\r\nProbable Causes\r\nSYSTEM SHUTDOWN\r\n\r\nDetail Data\r\nUSER ID\r\n           0\r\n0=SOFT IPL 1=HALT 2=TIME REBOOT\r\n           1\r\nTIME TO REBOOT (FOR TIMED REBOOT ONLY)\r\n           0\r\n---------------------------------------------------------------------------\r\nLABEL:                ERRLOG_ON\r\nIDENTIFIER:        9DBCFDEE\r\n\r\nDate/Time:       Tue Jul 12 20:04:11 BEIS\r\nSequence Number: 177\r\nMachine Id:      00570A9E4C00\r\nNode Id:         localhost\r\nClass:           O\r\nType:            TEMP\r\nResource Name:   errdemon        \r\n\r\nDescription\r\nERROR LOGGING TURNED ON\r\n\r\nProbable Causes\r\nERRDEMON STARTED AUTOMATICALLY\r\n\r\nUser Causes\r\n/USR/LIB/ERRDEMON COMMAND\r\n\r\n        Recommended Actions\r\n        NONE\r\n\r\n---------------------------------------------------------------------------\r\nLABEL:                OPMSG\r\nIDENTIFIER:        AA8AB241\r\n\r\nDate/Time:       Tue Jul 12 19:08:43 BEIS\r\nSequence Number: 176\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           O\r\nType:            TEMP\r\nResource Name:   OPERATOR        \r\n\r\nDescription\r\nOPERATOR NOTIFICATION\r\n\r\nUser Causes\r\nERRLOGGER COMMAND\r\n\r\n        Recommended Actions\r\n        REVIEW DETAILED DATA\r\n\r\nDetail Data\r\nMESSAGE FROM ERRLOGGER COMMAND\r\nclexit.rc : Unexpected termination of clstrmgrES\r\n---------------------------------------------------------------------------\r\nLABEL:                SRC_SVKO\r\nIDENTIFIER:        BC3BE5A3\r\n\r\nDate/Time:       Tue Jul 12 19:08:41 BEIS\r\nSequence Number: 175\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SRC             \r\n\r\nDescription\r\nSOFTWARE PROGRAM ERROR\r\n\r\nProbable Causes\r\nAPPLICATION PROGRAM\r\n\r\nFailure Causes\r\nSOFTWARE PROGRAM\r\n\r\n        Recommended Actions\r\n        MANUALLY RESTART SUBSYSTEM IF NEEDED\r\n\r\nDetail Data\r\nSYMPTOM CODE\r\n      721035\r\nSOFTWARE ERROR CODE\r\n       -9017\r\nERROR CODE\r\n           0\r\nDETECTING MODULE\r\n\'srchevn.c\'@line:\'350\'\r\nFAILING MODULE\r\nclstrmgrES\r\n---------------------------------------------------------------------------\r\nLABEL:                CORE_DUMP\r\nIDENTIFIER:        B6048838\r\n\r\nDate/Time:       Tue Jul 12 19:08:41 BEIS\r\nSequence Number: 174\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSPROC         \r\n\r\nDescription\r\nSOFTWARE PROGRAM ABNORMALLY TERMINATED\r\n\r\nProbable Causes\r\nSOFTWARE PROGRAM\r\n\r\nUser Causes\r\nUSER GENERATED SIGNAL\r\n\r\n        Recommended Actions\r\n        CORRECT THEN RETRY\r\n\r\nFailure Causes\r\nSOFTWARE PROGRAM\r\n\r\n        Recommended Actions\r\n        RERUN THE APPLICATION PROGRAM\r\n        IF PROBLEM PERSISTS THEN DO THE FOLLOWING\r\n        CONTACT APPROPRIATE SERVICE REPRESENTATIVE\r\n\r\nDetail Data\r\nSIGNAL NUMBER\r\n          11\r\nUSER\'S PROCESS ID:\r\n       37274\r\nFILE SYSTEM SERIAL NUMBER\r\n           1\r\nINODE NUMBER\r\n           2\r\nPROCESSOR ID\r\n           0\r\nCORE FILE NAME\r\n/core\r\nPROGRAM NAME\r\nclstrmgr\r\nADDITIONAL INFORMATION\r\nha_gs_sen 1C0\r\nha_gs_sen 1B0\r\nUnable to generate symptom string.\r\n---------------------------------------------------------------------------\r\nLABEL:                GS_TS_RETCODE_ER\r\nIDENTIFIER:        64368504\r\n\r\nDate/Time:       Mon Jul 11 23:52:07 BEIS\r\nSequence Number: 173\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           O\r\nType:            PERM\r\nResource Name:   grpsvcs         \r\n\r\nDescription\r\nConnection failure between Group Services and Topology Services\r\n\r\nProbable Causes\r\nTopology Services daemon is not running\r\nTopology Services daemon has died\r\nTopology Services library has detected an error\r\n\r\nFailure Causes\r\nGroup Services detects an error condition of Topology Services\r\n\r\n        Recommended Actions\r\n        Check the Topology Services daemon\r\nVerify that Group Services daemon has been restarted\r\nCall IBM Service if problem persists\r\n\r\nDetail Data\r\nDETECTING MODULE\r\nRSCT,PMAdaptMbr.C,1.44,716                    \r\nERROR ID \r\n62IcBY/bKdo0/cac132kN8....................\r\nREFERENCE CODE\r\n                                          \r\nDIAGNOSTIC EXPLANATION\r\nReceived unknown adapter [97] for PMAdaptMbr name [allAdapter_13_0_ether] from hats.\r\n---------------------------------------------------------------------------\r\nLABEL:                TS_LOC_DOWN_ST\r\nIDENTIFIER:        173C787F\r\n\r\nDate/Time:       Mon Jul 11 23:51:49 BEIS\r\nSequence Number: 172\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           S\r\nType:            INFO\r\nResource Name:   topsvcs         \r\n\r\nDescription\r\nPossible malfunction on local adapter\r\n\r\nProbable Causes\r\nLocal adapter mal-functioned\r\nLocal adapter lost connection to network\r\nLocal adapter mis-configured\r\n\r\nFailure Causes\r\nLocal adapter mal-functioned\r\nLocal adapter lost connection to network\r\nLocal adapter mis-configured\r\n\r\n        Recommended Actions\r\n        Verify adapter configuration\r\n        Verify network connectivity\r\n\r\nDetail Data\r\nDETECTING MODULE\r\nrsct,nim_control.C,1.38,4143                  \r\nERROR ID \r\n6zV5DL.JKdo0/zzx/32kN8....................\r\nREFERENCE CODE\r\n                                          \r\nAdapter interface name\r\ntty0\r\nAdapter offset\r\n           2\r\nAdapter IP address\r\n255.255.0.1\r\n---------------------------------------------------------------------------\r\nLABEL:                AIXIF_ARP_DUP_ADDR\r\nIDENTIFIER:        FE2DEE00\r\n\r\nDate/Time:       Mon Jul 11 23:51:02 BEIS\r\nSequence Number: 171\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSXAIXIF       \r\n\r\nDescription\r\nDUPLICATE IP ADDRESS DETECTED IN THE NET\r\n\r\nFailure Causes\r\nARP RESPONSE RECEIVED FOR MY IP ADDRESS\r\n\r\n        Recommended Actions\r\n        CONTACT NETWORK ADMINISTRATOR\r\n\r\nDetail Data\r\nDUPLICATE IP ADDRESS\r\n0A67 0103 \r\nMAC ADDRESS\r\n000D 600B 866E \r\n---------------------------------------------------------------------------\r\nLABEL:                AIXIF_ARP_DUP_ADDR\r\nIDENTIFIER:        FE2DEE00\r\n\r\nDate/Time:       Mon Jul 11 23:51:01 BEIS\r\nSequence Number: 170\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSXAIXIF       \r\n\r\nDescription\r\nDUPLICATE IP ADDRESS DETECTED IN THE NET\r\n\r\nFailure Causes\r\nARP RESPONSE RECEIVED FOR MY IP ADDRESS\r\n\r\n        Recommended Actions\r\n        CONTACT NETWORK ADMINISTRATOR\r\n\r\nDetail Data\r\nDUPLICATE IP ADDRESS\r\n0A67 0103 \r\nMAC ADDRESS\r\n000D 600B 866E
作者: yanbing    时间: 2005-07-13 21:52
标题: P630宕机了,大家帮忙找一下原因。
就是ip地址重复的问题。。。\r\n\r\nshutdown是hacmp控制的强制关机。
作者: tony8201    时间: 2005-07-14 09:17
标题: P630宕机了,大家帮忙找一下原因。
如果a机由于IP地址重复在11号晚上宕机。\r\n那么b机为什么会在12号晚上宕机呢?只有这几个报告:\r\n\r\n2BFA76F6   0712190805 T S SYSPROC        SYSTEM SHUTDOWN BY USER \r\n9DBCFDEE   0712200405 T O errdemon       ERROR LOGGING TURNED ON \r\nAA8AB241   0712190805 T O OPERATOR       OPERATOR NOTIFICATION \r\nBC3BE5A3   0712190805 P S SRC            SOFTWARE PROGRAM ERROR \r\nB6048838   0712190805 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED\r\n\r\n---------------------------------------------------------------------------\r\nLABEL:                REBOOT_ID\r\nIDENTIFIER:        2BFA76F6\r\n\r\nDate/Time:       Tue Jul 12 19:08:51 BEIS\r\nSequence Number: 178\r\nMachine Id:      00570A9E4C00\r\nNode Id:         localhost\r\nClass:           S\r\nType:            TEMP\r\nResource Name:   SYSPROC         \r\n\r\nDescription\r\nSYSTEM SHUTDOWN BY USER\r\n\r\nProbable Causes\r\nSYSTEM SHUTDOWN\r\n\r\nDetail Data\r\nUSER ID\r\n           0\r\n0=SOFT IPL 1=HALT 2=TIME REBOOT\r\n           1\r\nTIME TO REBOOT (FOR TIMED REBOOT ONLY)\r\n           0\r\n---------------------------------------------------------------------------\r\nLABEL:                ERRLOG_ON\r\nIDENTIFIER:        9DBCFDEE\r\n\r\nDate/Time:       Tue Jul 12 20:04:11 BEIS\r\nSequence Number: 177\r\nMachine Id:      00570A9E4C00\r\nNode Id:         localhost\r\nClass:           O\r\nType:            TEMP\r\nResource Name:   errdemon        \r\n\r\nDescription\r\nERROR LOGGING TURNED ON\r\n\r\nProbable Causes\r\nERRDEMON STARTED AUTOMATICALLY\r\n\r\nUser Causes\r\n/USR/LIB/ERRDEMON COMMAND\r\n\r\n        Recommended Actions\r\n        NONE\r\n\r\n---------------------------------------------------------------------------\r\nLABEL:                OPMSG\r\nIDENTIFIER:        AA8AB241\r\n\r\nDate/Time:       Tue Jul 12 19:08:43 BEIS\r\nSequence Number: 176\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           O\r\nType:            TEMP\r\nResource Name:   OPERATOR        \r\n\r\nDescription\r\nOPERATOR NOTIFICATION\r\n\r\nUser Causes\r\nERRLOGGER COMMAND\r\n\r\n        Recommended Actions\r\n        REVIEW DETAILED DATA\r\n\r\nDetail Data\r\nMESSAGE FROM ERRLOGGER COMMAND\r\nclexit.rc : Unexpected termination of clstrmgrES\r\n---------------------------------------------------------------------------\r\nLABEL:                SRC_SVKO\r\nIDENTIFIER:        BC3BE5A3\r\n\r\nDate/Time:       Tue Jul 12 19:08:41 BEIS\r\nSequence Number: 175\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SRC             \r\n\r\nDescription\r\nSOFTWARE PROGRAM ERROR\r\n\r\nProbable Causes\r\nAPPLICATION PROGRAM\r\n\r\nFailure Causes\r\nSOFTWARE PROGRAM\r\n\r\n        Recommended Actions\r\n        MANUALLY RESTART SUBSYSTEM IF NEEDED\r\n\r\nDetail Data\r\nSYMPTOM CODE\r\n      721035\r\nSOFTWARE ERROR CODE\r\n       -9017\r\nERROR CODE\r\n           0\r\nDETECTING MODULE\r\n\'srchevn.c\'@line:\'350\'\r\nFAILING MODULE\r\nclstrmgrES\r\n---------------------------------------------------------------------------\r\nLABEL:                CORE_DUMP\r\nIDENTIFIER:        B6048838\r\n\r\nDate/Time:       Tue Jul 12 19:08:41 BEIS\r\nSequence Number: 174\r\nMachine Id:      00570A9E4C00\r\nNode Id:         xbserver2\r\nClass:           S\r\nType:            PERM\r\nResource Name:   SYSPROC         \r\n\r\nDescription\r\nSOFTWARE PROGRAM ABNORMALLY TERMINATED\r\n\r\nProbable Causes\r\nSOFTWARE PROGRAM\r\n\r\nUser Causes\r\nUSER GENERATED SIGNAL\r\n\r\n        Recommended Actions\r\n        CORRECT THEN RETRY\r\n\r\nFailure Causes\r\nSOFTWARE PROGRAM\r\n\r\n        Recommended Actions\r\n        RERUN THE APPLICATION PROGRAM\r\n        IF PROBLEM PERSISTS THEN DO THE FOLLOWING\r\n        CONTACT APPROPRIATE SERVICE REPRESENTATIVE\r\n\r\nDetail Data\r\nSIGNAL NUMBER\r\n          11\r\nUSER\'S PROCESS ID:\r\n       37274\r\nFILE SYSTEM SERIAL NUMBER\r\n           1\r\nINODE NUMBER\r\n           2\r\nPROCESSOR ID\r\n           0\r\nCORE FILE NAME\r\n/core\r\nPROGRAM NAME\r\nclstrmgr\r\nADDITIONAL INFORMATION\r\nha_gs_sen 1C0\r\nha_gs_sen 1B0\r\nUnable to generate symptom string.\r\n---------------------------------------------------------------------------
作者: tony8201    时间: 2005-07-14 09:25
标题: P630宕机了,大家帮忙找一下原因。
主机配置:2 CPU,2 G内存,FastT600盘柜。\r\n系统:AIX5.2 + ML04 + HACMP5.2 + SYBASE12.5 + C6.0
作者: xuwelcome    时间: 2005-07-14 15:14
标题: P630宕机了,大家帮忙找一下原因。
有个疑问\r\nLABEL: TS_LOC_DOWN_ST \r\nIDENTIFIER: 173C787F \r\n\r\nDate/Time:       Mon Jul 11 23:51:49 BEIS \r\nSequence Number: 172 \r\nMachine Id:      00570A9E4C00 \r\nNode Id:         xbserver2 \r\nClass:           S \r\nType:            INFO \r\nResource Name:   topsvcs         \r\n\r\nDescription \r\nPossible malfunction on local adapter \r\n\r\nProbable Causes \r\nLocal adapter mal-functioned \r\nLocal adapter lost connection to network \r\nLocal adapter mis-configured \r\n\r\nFailure Causes \r\nLocal adapter mal-functioned \r\nLocal adapter lost connection to network \r\nLocal adapter mis-configured \r\n\r\nRecommended Actions \r\nVerify adapter configuration \r\nVerify network connectivity \r\n\r\nDetail Data \r\nDETECTING MODULE \r\nrsct,nim_control.C,1.38,4143                   \r\nERROR ID \r\n6zV5DL.JKdo0/zzx/32kN8.................... \r\nREFERENCE CODE \r\n                                          \r\nAdapter interface name \r\ntty0 \r\nAdapter offset \r\n          2 \r\nAdapter IP address \r\n255.255.0.1\r\n\r\ntty怎么会有ip address 255.255.0.1,而且255.255.0.1不论是作为地址还是掩码都是不符合规范的。我是想hacmp的配置是不是有问题,可以查查。
作者: tony8201    时间: 2005-07-14 15:16
标题: P630宕机了,大家帮忙找一下原因。
a机mail报错:\r\n\r\nFrom root Wed Jul 13 04:06:58 2005\r\nReceived: (from root@localhost) by xbserver1_stdby (AIX5.2/8.11.6p2/8.11.0) id j6CK6vV37530 for root; Wed, 13 Jul 2005 04:06:57 +0800\r\nDate: Wed, 13 Jul 2005 04:06:57 +0800\r\nFrom: root\r\nMessage-Id: <200507122006.j6CK6vV37530@xbserver1_stdby>;\r\nTo: root\r\nSubject: diagela message from xbserver1\r\nStatus: RO\r\n\r\nA PROBLEM WAS DETECTED ON Wed Jul 13 04:05:57 BEIST 2005                  801014\r\n                        \r\nThe Service Request Number(s)/Probable Cause(s)\r\n(causes are listed in descending order of probability):\r\n\r\n  652-880: The CEC or SPCN reported a non-critical error. Report the SRN and\r\n           the following reference and physical location codes to your service\r\n           provider.\r\n           Error log information:\r\n                 Date: Tue Jul 12 20:05:38 BEIST 2005\r\n                 Sequence number: 181\r\n                 Label: SCAN_ERROR_CHRP\r\n    Ref. Code: B0061406 FRU: n/a              n/a             \r\n\r\nFrom root Tue Jul 12 20:07:24 2005\r\nReceived: (from root@localhost) by localhost (AIX5.2/8.11.6p2/8.11.0) id j6CC7Nm05542 for root; Tue, 12 Jul 2005 20:07:23 +0800\r\nDate: Tue, 12 Jul 2005 20:07:23 +0800\r\nFrom: root\r\nMessage-Id: <200507121207.j6CC7Nm05542@localhost>;\r\nTo: root\r\nSubject: diagela message from localhost\r\nStatus: RO\r\n\r\nA PROBLEM WAS DETECTED ON Tue Jul 12 20:07:21 BEIST 2005                  801014\r\n                        \r\nThe Service Request Number(s)/Probable Cause(s)\r\n(causes are listed in descending order of probability):\r\n\r\n  652-880: The CEC or SPCN reported a non-critical error. Report the SRN and\r\n           the following reference and physical location codes to your service\r\n           provider.\r\n           Error log information:\r\n                 Date: Tue Jul 12 20:05:38 BEIST 2005\r\n                 Sequence number: 181\r\n                 Label: SCAN_ERROR_CHRP\r\n    Ref. Code: B0061406 FRU: n/a              n/a             \r\n\r\nFrom root Tue Apr 12 14:01:00 2005\r\nReceived: (from root@localhost) by loopback (AIX5.2/8.11.6p2/8.11.0) id j3C510v42772 for root; Tue, 12 Apr 2005 14:01:00 +0900\r\nDate: Tue, 12 Apr 2005 14:01:00 +0900\r\nFrom: root\r\nMessage-Id: <200504120501.j3C510v42772@loopback>;\r\nTo: root\r\nSubject: diagela message from xbserver1\r\nStatus: RO\r\n\r\nTESTING COMPLETE on Tue Apr 12 13:24:38 BEIDT 2005                        801010\r\n\r\nNo trouble was found.\r\n\r\nThe resources tested were:\r\n                        \r\n- proc1            U0.1-P1-C1           Processor
作者: tony8201    时间: 2005-07-14 15:18
标题: P630宕机了,大家帮忙找一下原因。
B0061406\r\n应该是系统微码的错误,需要升级微码吧?!
作者: firefoxli    时间: 2005-07-15 16:17
标题: P630宕机了,大家帮忙找一下原因。
硬件检测没有问题,据管理员讲电源和网络没有问题。 不能全信,网络为什么报IP冲突?有没有其他人动过???\r\n\r\n关注。我见过有篇文章说是\r\nhacmp环境?\r\n\r\n1.升级 bos.rte.libpthreads 的包到最新的级别。\r\n2.降低NIM failure detact rate.\r\nsmitty hacmp\r\ncluster config\r\ncluster topology\r\nconfigure Network Modules\r\nChange a Network Module using Predefined Values\r\n把rs232 和 Ethernet 的值都调慢。
作者: liudw    时间: 2005-07-16 00:08
标题: P630宕机了,大家帮忙找一下原因。
在双机倒换的时候会提示这种tty故障,这个不是什么特殊的
作者: 强人    时间: 2005-09-26 17:09
标题: P630宕机了,大家帮忙找一下原因。
楼主的问题,最后解决了吗?\r\n我感觉都没谈到点子上啊?\r\n为什么都不管  错误类型是B6048838的呢?\r\n感觉是不是软件原因引起的?\r\nIP 重复不会引起core dump把?
作者: novmcgrady    时间: 2006-04-08 10:23
最后解决了没有啊?\r\n\r\n我也遇到了同样的问题
作者: herowangzj    时间: 2006-04-08 12:05
关注中,觉得不是IP冲突引起的,我做HA时遇到过IP冲突,就报一个IP冲突错误就完了,当然是服务IP冲突的,当应该关系不大啊,感觉是软件引起的,希望高手指点!
作者: Joker2004    时间: 2006-04-10 17:44
应该就是IP的问题,IP冲突,可能双机的boot或者stb网卡被冲掉了,导致双机信息不能同步引起宕机了。
作者: feiaix    时间: 2006-12-19 16:51
楼主的问题怎么解决的呀 ,我也 遇到了同样的问题了。:(\r\nBAD LUCK
作者: Jens    时间: 2006-12-21 17:07
我以前也碰到过同样的问题,但也维护公司的人来了没能解决,两台机重启后就好了。。。
作者: feiaix    时间: 2006-12-22 09:17
是不是心跳没配置好啊。然后I/O太大机器宕掉了。
作者: lj_cd    时间: 2006-12-22 10:25
有问题就来问,解决了就不说,唉。。。。。。。。。。




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2