- 论坛徽章:
- 0
|
我的环境,5300-06,Hacmp是5.4.0.0
最近一段时间在凌晨时刻,时而出现故障,oasrv2被hacmp关机了,
oasrv1的Service IP有效,但上边跑的RG已经失效,只能重启这个RG才行。
下面是cluster.log的日志的相关时间段截取
Dec 27 02:54:51 OAsrv1 daemon:err|error topsvcs[991248]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6BUfAx.v9AyC/dUQ./UAe.1...................:::Reference ID: :::Template ID: 3d32b80d:::Details File: :::Location: rsct,nim_control.C,1.39.1.18,5919 :::TS_NIM_ERROR_STUCK_ER NIM thread blocked Thread which was blocked receive thread Interval in seconds during which process was blocked 120 Interface name en3
Dec 27 02:54:52 OAsrv1 daemon:err|error topsvcs[991248]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6BUfAx.w9AyC/rZs//UAe.1...................:::Reference ID: :::Template ID: 3d32b80d:::Details File: :::Location: rsct,nim_control.C,1.39.1.18,5919 :::TS_NIM_ERROR_STUCK_ER NIM thread blocked Thread which was blocked receive thread Interval in seconds during which process was blocked 120 Interface name en0
Dec 27 02:54:52 OAsrv1 daemon:err|error topsvcs[991248]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6BUfAx.w9AyC/sls//UAe.1...................:::Reference ID: :::Template ID: 3d32b80d:::Details File: :::Location: rsct,nim_control.C,1.39.1.18,5919 :::TS_NIM_ERROR_STUCK_ER NIM thread blocked Thread which was blocked receive thread Interval in seconds during which process was blocked 113 Interface name rhdisk2
Dec 27 02:54:52 OAsrv1 daemon:err|error topsvcs[991248]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6BUfAx.w9AyC/Het//UAe.1...................:::Reference ID: :::Template ID: 3d32b80d:::Details File: :::Location: rsct,nim_control.C,1.39.1.18,5919 :::TS_NIM_ERROR_STUCK_ER NIM thread blocked Thread which was blocked netmon thread Interval in seconds during which process was blocked 58 Interface name en3
Dec 27 02:54:52 OAsrv1 daemon:err|error topsvcs[991248]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6BUfAx.w9AyC/Fxt//UAe.1...................:::Reference ID: :::Template ID: 3d32b80d:::Details File: :::Location: rsct,nim_control.C,1.39.1.18,5919 :::TS_NIM_ERROR_STUCK_ER NIM thread blocked Thread which was blocked command receive thread Interval in seconds during which process was blocked 128 Interface name en0
Dec 27 02:55:06 OAsrv1 daemon:err|error haemd[1347816]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.36,L#=1836, haemd: 2521-034 Not responding to Group Services - terminating.
Dec 27 02:55:08 OAsrv1 local0:crit clstrmgrES[1310746]: Tue Dec 27 02:55:08 Removing 2 from ml_idx
Dec 27 02:55:18 OAsrv1 user:notice HACMP for AIX: EVENT START: node_down oasrv2
Dec 27 02:55:21 OAsrv1 user:notice HACMP for AIX: EVENT COMPLETED: node_down oasrv2 0
Dec 27 02:55:21 OAsrv1 user:notice HACMP for AIX: EVENT START: node_down_complete oasrv2
Dec 27 02:55:22 OAsrv1 user:notice HACMP for AIX: EVENT COMPLETED: node_down_complete oasrv2 0
Dec 27 02:55:22 OAsrv1 local0:crit clstrmgrES[1310746]: Tue Dec 27 02:55:22 createAndConnectClientSocket: Setting up commpath for connection /usr/es/sbin/cluster/HacmpRgRmWakeup
Dec 27 02:55:22 OAsrv1 local0:crit clstrmgrES[1310746]: Tue Dec 27 02:55:22 createAndConnectClientSocket : connect(/usr/es/sbin/cluster/HacmpRgRmWakeup) failed, errno=2
Dec 27 02:55:47 OAsrv1 daemon:err|error topsvcs[991248]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6BUfAx.nAAyC/Ezt0/UAe.1...................:::Reference ID: :::Template ID: 3d32b80d:::Details File: :::Location: rsct,nim_control.C,1.39.1.18,5919 :::TS_NIM_ERROR_STUCK_ER NIM thread blocked Thread which was blocked send thread Interval in seconds during which process was blocked 35 Interface name rhdisk2
Dec 27 02:56:41 OAsrv1 daemon:err|error topsvcs[991248]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6BUfAx.dBAyC/ehS0/UAe.1...................:::Reference ID: :::Template ID: 3d32b80d:::Details File: :::Location: rsct,nim_control.C,1.39.1.18,5919 :::TS_NIM_ERROR_STUCK_ER NIM thread blocked Thread which was blocked send thread Interval in seconds during which process was blocked 36 Interface name rhdisk2
下边是nim.topsvcs.rhdisk2.oa_cluster日志的相关时间段截取
12/26 19:28:22.884: Heartbeat was NOT received. Missed HBs: 1. Limit: 8
12/27 02:53:05.845: Heartbeat was NOT received. Missed HBs: 1. Limit: 8
12/27 02:53:11.851: Heartbeat was NOT received. Missed HBs: 2. Limit: 8
12/27 02:53:17.851: Heartbeat was NOT received. Missed HBs: 3. Limit: 8
12/27 02:53:23.854: Heartbeat was NOT received. Missed HBs: 4. Limit: 8
12/27 02:53:29.854: Heartbeat was NOT received. Missed HBs: 5. Limit: 8
12/27 02:53:35.861: Heartbeat was NOT received. Missed HBs: 6. Limit: 8
12/27 02:53:41.866: Heartbeat was NOT received. Missed HBs: 7. Limit: 8
12/27 02:53:47.871: Heartbeat was NOT received. Missed HBs: 8. Limit: 8
12/27 02:53:47.871: Local adapter is up: issuing notification for remote adapter
12/27 02:53:47.871: Adapter status successfully sent.
12/27 02:54:51.116: Received a SEND MSG command. Dst: .
12/27 02:54:51.117: Received a SEND MSG command. Dst: .
12/27 02:54:51.117: Received a STOP HB command.
12/27 02:54:51.117: Received a STOP MONITOR command.
12/27 02:54:51.412: Receive thread blocked for 113 seconds.
12/27 02:54:51.412: nim error successfully sent.
12/27 02:54:51.677: Received a SEND MSG command. Dst: .
12/27 02:54:53.769: Received a SEND MSG command. Dst: .
12/27 02:54:55.820: Received a SEND MSG command. Dst: .
12/27 02:54:55.871: Received a SEND MSG command. Dst: .
12/27 02:54:55.922: Received a SEND MSG command. Dst: .
12/27 02:54:58.474: Received a SEND MSG command. Dst: .
12/27 02:55:01.824: Received a SEND MSG command. Dst: .
12/27 02:55:03.077: Received a SEND MSG command. Dst: .
12/27 02:55:11.844: Received a SEND MSG command. Dst: .
12/27 02:55:11.870: Received a START HB command. Destination: .
12/27 02:55:11.870: set_dhb_polling_rate(): Default poll speed 40
12/27 02:55:11.870: Received a SEND MSG command. Dst: .
12/27 02:55:11.870: Received a START MONITOR command.
12/27 02:55:11.870: Address: How often: 6000 msec Sensitivity: 8 Configuration Instance: 43
12/27 02:55:11.870: Received a SEND MSG command. Dst: .
12/27 02:55:11.870: Received a SEND MSG command. Dst: .
12/27 02:55:12.970: Received a SEND MSG command. Dst: .
12/27 02:55:17.751: Received a SEND MSG command. Dst: .
12/27 02:55:17.751: Received a SEND MSG command. Dst: .
12/27 02:55:17.875: Received a SEND MSG command. Dst: .
12/27 02:55:22.877: Received a SEND MSG command. Dst: .
12/27 02:55:23.871: Heartbeat was NOT received. Missed HBs: 1. Limit: 8
12/27 02:55:25.881: Received a SEND MSG command. Dst: .
12/27 02:55:25.881: Received a SEND MSG command. Dst: .
12/27 02:55:27.877: Received a SEND MSG command. Dst: .
12/27 02:55:29.881: Heartbeat was NOT received. Missed HBs: 2. Limit: 8
12/27 02:55:32.877: Received a SEND MSG command. Dst: .
12/27 02:55:35.881: Heartbeat was NOT received. Missed HBs: 3. Limit: 8
12/27 02:55:36.857: Received a SEND MSG command. Dst: .
12/27 02:55:36.857: Received a SEND MSG command. Dst: .
12/27 02:55:37.877: Received a SEND MSG command. Dst: .
12/27 02:55:41.891: Heartbeat was NOT received. Missed HBs: 4. Limit: 8
12/27 02:55:41.961: Received a SEND MSG command. Dst: .
12/27 02:55:42.877: Received a SEND MSG command. Dst: .
12/27 02:55:47.711: writePacket(): Unable to write for too long
12/27 02:55:47.761: Send thread blocked for 35 seconds.
12/27 02:55:47.761: nim error successfully sent.
12/27 02:55:47.877: Received a SEND MSG command. Dst: .
12/27 02:55:47.901: Heartbeat was NOT received. Missed HBs: 5. Limit: 8
12/27 02:55:49.131: Received a SEND MSG command. Dst: .
12/27 02:55:49.131: Received a SEND MSG command. Dst: .
12/27 02:55:49.881: writePacket(): Unable to write for too long
12/27 02:55:52.049: writePacket(): Unable to write for too long
12/27 02:55:52.877: Received a SEND MSG command. Dst: .
12/27 02:55:53.911: Heartbeat was NOT received. Missed HBs: 6. Limit: 8
12/27 02:55:54.223: writePacket(): Unable to write for too long
12/27 02:55:56.401: writePacket(): Unable to write for too long
12/27 02:55:57.877: Received a SEND MSG command. Dst: .
12/27 02:55:58.561: writePacket(): Unable to write for too long
12/27 02:55:59.911: Heartbeat was NOT received. Missed HBs: 7. Limit: 8
12/27 02:56:00.725: writePacket(): Unable to write for too long
12/27 02:56:02.878: Received a SEND MSG command. Dst: .
12/27 02:56:02.887: writePacket(): Unable to write for too long
12/27 02:56:05.056: writePacket(): Unable to write for too long
12/27 02:56:05.106: 8 failed writes in a row - clearing send queue.
12/27 02:56:05.881: Received a SEND MSG command. Dst: .
12/27 02:56:05.881: Received a SEND MSG command. Dst: .
12/27 02:56:05.921: Heartbeat was NOT received. Missed HBs: 8. Limit: 8
12/27 02:56:05.921: Local adapter is up: issuing notification for remote adapter
12/27 02:56:05.921: Adapter status successfully sent.
12/27 02:56:05.921: Received a STOP HB command.
12/27 02:56:05.921: Received a STOP MONITOR command.
12/27 02:56:06.871: Received a SEND MSG command. Dst: .
12/27 02:56:16.878: Received a SEND MSG command. Dst: .
12/27 02:56:20.930: Received a SEND MSG command. Dst: .
12/27 02:56:26.881: Received a SEND MSG command. Dst: .
12/27 02:56:35.996: Received a SEND MSG command. Dst: .
12/27 02:56:36.891: Received a SEND MSG command. Dst: .
12/27 02:56:41.599: writePacket(): Unable to write for too long
12/27 02:56:41.649: Send thread blocked for 36 seconds.
12/27 02:56:41.649: nim error successfully sent.
12/27 02:56:43.760: writePacket(): Unable to write for too long
12/27 02:56:45.923: writePacket(): Unable to write for too long
12/27 02:56:46.895: Received a SEND MSG command. Dst: .
12/27 02:56:50.997: Received a SEND MSG command. Dst: .
12/27 02:56:52.287: writePacket(): Unable to write for too long
12/27 02:56:56.560: writePacket(): Unable to write for too long
12/27 02:56:56.901: Received a SEND MSG command. Dst: .
12/27 02:57:02.928: writePacket(): Unable to write for too long
12/27 02:57:06.006: Received a SEND MSG command. Dst: .
12/27 02:57:06.907: Received a SEND MSG command. Dst: .
12/27 02:57:09.294: writePacket(): Unable to write for too long
12/27 02:57:13.561: writePacket(): Unable to write for too long
12/27 02:57:16.911: Received a SEND MSG command. Dst: .
12/27 02:57:19.928: writePacket(): Unable to write for too long
12/27 02:57:19.978: 8 failed writes in a row - clearing send queue.
12/27 02:57:21.011: Received a SEND MSG command. Dst: .
12/27 02:57:26.919: Received a SEND MSG command. Dst: .
12/27 02:57:36.027: Received a SEND MSG command. Dst: .
12/27 02:57:36.921: Received a SEND MSG command. Dst: .
12/27 02:57:46.928: Received a SEND MSG command. Dst: .
12/27 02:57:51.028: Received a SEND MSG command. Dst: .
12/27 02:57:56.731: writePacket(): Unable to write for too long
12/27 02:57:56.932: Received a SEND MSG command. Dst: .
12/27 02:58:03.097: writePacket(): Unable to write for too long
12/27 02:58:06.042: Received a SEND MSG command. Dst: .
12/27 02:58:06.941: Received a SEND MSG command. Dst: .
12/27 02:58:09.468: writePacket(): Unable to write for too long
12/27 02:58:15.834: writePacket(): Unable to write for too long
12/27 02:58:16.961: Received a SEND MSG command. Dst: .
12/27 02:58:17.301: dhb_lost_handshake_fct(): Restarting handshaking
12/27 02:58:17.303: initHS(): Wrote initial handshake
12/27 02:58:21.051: Received a SEND MSG command. Dst: .
12/27 02:58:26.964: Received a SEND MSG command. Dst: .
12/27 02:58:36.071: Received a SEND MSG command. Dst: .
12/27 02:58:36.971: Received a SEND MSG command. Dst: .
12/27 02:58:46.991: Received a SEND MSG command. Dst: .
12/27 02:58:51.071: Received a SEND MSG command. Dst: .
|
|