- 论坛徽章:
- 0
|
本帖最后由 ty123555 于 2011-11-29 21:04 编辑
heartbeat集群有两个节点组成,采用的是heartbeat v2 cib.xml配置的形式,配置文件中并没有关于fencing和watchdog等会导致主机重启的配置,但不知道为什么有时候节点会自动重启:
重启时heartbeat产生日志如下:
heartbeat[4113]: 2011/11/27_10:09:31 CRIT: Cluster node suse1 returning after partition.
heartbeat[4113]: 2011/11/27_10:09:31 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
heartbeat[4113]: 2011/11/27_10:09:31 WARN: Deadtime value may be too small.
heartbeat[4113]: 2011/11/27_10:09:31 info: See FAQ for information on tuning deadtime.
heartbeat[4113]: 2011/11/27_10:09:31 info: URL: http://linux-ha.org/FAQ#heavy_load
heartbeat[4113]: 2011/11/27_10:09:31 WARN: Late heartbeat: Node suse1: interval 3500 ms
heartbeat[4113]: 2011/11/27_10:09:31 info: Status update for node suse1: status active
crmd[4412]: 2011/11/27_10:09:31 notice: crmd_ha_status_callback: Status update: Node suse1 now has status [active]
crmd[4412]: 2011/11/27_10:09:31 info: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_JOIN_REQUEST cause=C_HA_MESSAGE origin=route_message ]
crmd[4412]: 2011/11/27_10:09:31 info: update_dc: Unset DC suse2
crmd[4412]: 2011/11/27_10:09:31 info: erase_node_from_join: Removed dead node suse2 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=0
crmd[4412]: 2011/11/27_10:09:31 info: do_dc_join_offer_all: join-8: Waiting on 1 outstanding join acks
crmd[4412]: 2011/11/27_10:09:31 info: update_dc: Set DC to suse2 (2.0)
crmd[4412]: 2011/11/27_10:09:31 info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
crmd[4412]: 2011/11/27_10:09:32 info: do_state_transition: All 1 cluster nodes responded to the join offer.
attrd[4411]: 2011/11/27_10:09:33 info: attrd_local_callback: Sending full refresh
cib[4408]: 2011/11/27_10:09:32 info: sync_our_cib: Syncing CIB to all peers
cib[4408]: 2011/11/27_10:09:32 WARN: cib_peer_callback: Discarding cib_replace message (2752) from suse1: not in our membership
cib[4408]: 2011/11/27_10:09:32 WARN: cib_peer_callback: Discarding cib_apply_diff message (2754) from suse1: not in our membership
cib[4408]: 2011/11/27_10:09:32 WARN: cib_peer_callback: Discarding cib_apply_diff message (2756) from suse1: not in our membership
cib[4408]: 2011/11/27_10:09:32 WARN: cib_peer_callback: Discarding cib_apply_diff message (2757) from suse1: not in our membership
crmd[4412]: 2011/11/27_10:09:32 WARN: crmd_ha_msg_callback: Ignoring HA message (op=join_ack_nack) from suse1: not in our membership list (size=1)
crmd[4412]: 2011/11/27_10:09:32 ERROR: do_cl_join_finalize_respond: Join join-6 with suse1 failed. NACK'd
crmd[4412]: 2011/11/27_10:09:32 ERROR: do_log: [[FSA]] Input I_ERROR from do_cl_join_finalize_respond() received in state (S_FINALIZE_JOIN)
crmd[4412]: 2011/11/27_10:09:32 info: do_state_transition: State transition S_FINALIZE_JOIN -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=do_cl_join_finalize_respond ]
crmd[4412]: 2011/11/27_10:09:32 ERROR: do_recover: Action A_RECOVER (0000000001000000) not supported
crmd[4412]: 2011/11/27_10:09:32 WARN: do_election_vote: Not voting in election, we're in state S_RECOVERY
crmd[4412]: 2011/11/27_10:09:32 info: do_dc_release: DC role released
crmd[4412]: 2011/11/27_10:09:32 info: stop_subsystem: Sent -TERM to pengine: [4861]
crmd[4412]: 2011/11/27_10:09:32 info: stop_subsystem: Sent -TERM to tengine: [4860]
tengine[4860]: 2011/11/27_10:09:32 info: update_abort_priority: Abort priority upgraded to 1000000
pengine[4861]: 2011/11/27_10:09:32 info: pengine_shutdown: Exiting PEngine (SIGTERM)
crmd[4412]: 2011/11/27_10:09:32 ERROR: do_log: [[FSA]] Input I_TERMINATE from do_recover() received in state (S_RECOVERY)
crmd[4412]: 2011/11/27_10:09:32 info: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ]
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Terminating the pengine
crmd[4412]: 2011/11/27_10:09:32 info: stop_subsystem: Sent -TERM to pengine: [4861]
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Terminating the tengine
crmd[4412]: 2011/11/27_10:09:32 info: stop_subsystem: Sent -TERM to tengine: [4860]
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Waiting for subsystems to exit
crmd[4412]: 2011/11/27_10:09:32 WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: All subsystems stopped, continuing
crmd[4412]: 2011/11/27_10:09:32 WARN: do_log: [[FSA]] Input I_PENDING from do_election_vote() received in state (S_TERMINATE)
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Terminating the pengine
crmd[4412]: 2011/11/27_10:09:32 info: stop_subsystem: Sent -TERM to pengine: [4861]
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Terminating the tengine
crmd[4412]: 2011/11/27_10:09:32 info: stop_subsystem: Sent -TERM to tengine: [4860]
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Waiting for subsystems to exit
crmd[4412]: 2011/11/27_10:09:32 WARN: register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: All subsystems stopped, continuing
crmd[4412]: 2011/11/27_10:09:32 info: crmdManagedChildDied: Process pengine:[4861] exited (signal=0, exitcode=0)
crmd[4412]: 2011/11/27_10:09:32 WARN: do_log: [[FSA]] Input I_RELEASE_SUCCESS from do_dc_release() received in state (S_TERMINATE)
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Terminating the tengine
crmd[4412]: 2011/11/27_10:09:32 info: stop_subsystem: Sent -TERM to tengine: [4860]
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Waiting for subsystems to exit
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: All subsystems stopped, continuing
crmd[4412]: 2011/11/27_10:09:32 info: process_client_disconnect: Received HUP from pengine:[-1]
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Terminating the tengine
crmd[4412]: 2011/11/27_10:09:32 info: stop_subsystem: Sent -TERM to tengine: [4860]
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: Waiting for subsystems to exit
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: All subsystems stopped, continuing
tengine[4860]: 2011/11/27_10:09:32 info: update_abort_priority: Abort action 2 superceeded by 3
tengine[4860]: 2011/11/27_10:09:32 info: notify_crmd: Exiting after transition
tengine[4860]: 2011/11/27_10:09:32 info: te_init: Exiting tengine
crmd[4412]: 2011/11/27_10:09:32 info: crmdManagedChildDied: Process tengine:[4860] exited (signal=0, exitcode=0)
crmd[4412]: 2011/11/27_10:09:32 info: do_shutdown: All subsystems stopped, continuing
crmd[4412]: 2011/11/27_10:09:32 ERROR: verify_stopped: Resource ipservice was active at shutdown. You may ignore this error if it is unmanaged.
crmd[4412]: 2011/11/27_10:09:32 notice: ghash_print_pending_for_rsc: Recurring action ipservice:4 (ipservice_monitor_5000) incomplete at shutdown
crmd[4412]: 2011/11/27_10:09:32 info: do_lrm_control: Disconnected from the LRM
ccm[4407]: 2011/11/27_10:09:32 info: client (pid=4412) removed from ccm
crmd[4412]: 2011/11/27_10:09:32 info: do_ha_control: Disconnected from Heartbeat
crmd[4412]: 2011/11/27_10:09:32 info: do_cib_control: Disconnecting CIB
cib[4408]: 2011/11/27_10:09:32 info: cib_process_readwrite: We are now in R/O mode
crmd[4412]: 2011/11/27_10:09:32 info: crmd_cib_connection_destroy: Connection to the CIB terminated...
crmd[4412]: 2011/11/27_10:09:32 info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
crmd[4412]: 2011/11/27_10:09:32 ERROR: do_exit: Could not recover from internal error
crmd[4412]: 2011/11/27_10:09:32 info: free_mem: Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
crmd[4412]: 2011/11/27_10:09:32 info: do_exit: [crmd] stopped (2)
heartbeat[4113]: 2011/11/27_10:09:32 WARN: Managed /usr/lib/heartbeat/crmd process 4412 exited with return code 2.
heartbeat[4113]: 2011/11/27_10:09:32 EMERG: Rebooting system. Reason: /usr/lib/heartbeat/crmd
根据日志提示是crmd进程异常退出导致系统重启的,但不知道什么原因这使crmd进程异常退出,应该如何修改配置,以免再次出现这样的问题呢? |
|