Chinaunix
标题:
heartbeat服务自动重启
[打印本页]
作者:
zhjixi1234
时间:
2014-06-19 09:59
标题:
heartbeat服务自动重启
配置完linux ha的后想测试一下ha切换,
环境是REDHAT 6.4,安装heartbeat2.14
ha环境为vsphere环境下2台虚拟机,由于添加不了串口,使用IP 单播心跳。
网络配置如下:
vsyslog1 eth0 172.16.5.242
vsyslog2 eth0 172.16.5.243
ha.d/haresources
vsyslog1 172.16.5.152/24/eth0
vsyslog2 172.16.5.153/24/eth0
两台机器Heartbeat服务启动后,将VIP2个都在1号机上,将VIP153切至2号机后,断开2号机的虚拟网络,VIP自动切回1号机。
问题是,2号机恢复网络后,不知为什么,2台机器的heartbeat服务都重启了,重启后VIP2又切回2号机了。
明明配置auto_failback off 了好不好!!
######################################
[root@vsyslog1 ha.d]# cat ha.cf
debugfile /var/log/halog/ha-debug
logfile /var/log/halog/ha-log
logfacility local0
keepalive 2
deadtime 20
warntime 5
initdead 120
udpport 694
ucast eth0 172.16.5.243
auto_failback off
node vsyslog1
node vsyslog2
ping 172.16.5.30
hopfudge 1
deadping 5
######################################
当时的log:
将vip153切到2号机
heartbeat[7797]: 2014/06/19_09:17:58 info: vsyslog1 wants to go standby [foreign]
heartbeat[7797]: 2014/06/19_09:17:59 info: standby: vsyslog2 can take our foreign resources
heartbeat[8548]: 2014/06/19_09:17:59 info: give up foreign HA resources (standby).
ResourceManager[8561]: 2014/06/19_09:17:59 info: Releasing resource group: vsyslog2 172.16.5.153/24/eth0
ResourceManager[8561]: 2014/06/19_09:17:59 info: Running /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 stop
ResourceManager[8561]: 2014/06/19_09:17:59 debug: Starting /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 stop
In IP Stop
SIOCDELRT: No such process
IPaddr[8630]: 2014/06/19_09:17:59 INFO: ifconfig eth0:0 down
IPaddr[8601]: 2014/06/19_09:17:59 INFO: Success
INFO: Success
ResourceManager[8561]: 2014/06/19_09:17:59 debug: /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 stop done. RC=0
heartbeat[8548]: 2014/06/19_09:17:59 info: foreign HA resource release completed (standby).
heartbeat[7797]: 2014/06/19_09:17:59 info: Local standby process completed [foreign].
heartbeat[7797]: 2014/06/19_09:18:00 WARN: 1 lost packet(s) for [vsyslog2] [46:48]
heartbeat[7797]: 2014/06/19_09:18:00 info: remote resource transition completed.
heartbeat[7797]: 2014/06/19_09:18:00 info: No pkts missing from vsyslog2!
heartbeat[7797]: 2014/06/19_09:18:00 info: Other node completed standby takeover of foreign resources.
heartbeat[7797]: 2014/06/19_09:19:05 info: vsyslog2 wants to go standby [local]
heartbeat[7797]: 2014/06/19_09:19:06 info: standby: acquire [local] resources from vsyslog2
heartbeat[8663]: 2014/06/19_09:19:06 info: acquire foreign HA resources (standby).
ResourceManager[8676]: 2014/06/19_09:19:06 info: Acquiring resource group: vsyslog2 172.16.5.153/24/eth0
IPaddr[8703]: 2014/06/19_09:19:06 INFO: Resource is stopped
ResourceManager[8676]: 2014/06/19_09:19:06 info: Running /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 start
ResourceManager[8676]: 2014/06/19_09:19:06 debug: Starting /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 start
IPaddr[8803]: 2014/06/19_09:19:06 INFO: Using calculated netmask for 172.16.5.153: 255.255.255.0
IPaddr[8803]: 2014/06/19_09:19:06 DEBUG: Using calculated broadcast for 172.16.5.153: 172.16.5.255
IPaddr[8803]: 2014/06/19_09:19:06 INFO: eval ifconfig eth0:0 172.16.5.153 netmask 255.255.255.0 broadcast 172.16.5.255
IPaddr[8803]: 2014/06/19_09:19:06 DEBUG: Sending Gratuitous Arp for 172.16.5.153 on eth0:0 [eth0]
IPaddr[8774]: 2014/06/19_09:19:06 INFO: Success
INFO: Success
ResourceManager[8676]: 2014/06/19_09:19:06 debug: /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 start done. RC=0
heartbeat[8663]: 2014/06/19_09:19:06 info: foreign HA resource acquisition completed (standby).
heartbeat[7797]: 2014/06/19_09:19:06 info: Standby resource acquisition done [local].
heartbeat[7797]: 2014/06/19_09:19:07 info: remote resource transition completed.
断开2号机网络
heartbeat[7797]: 2014/06/19_09:22:00 info: Link vsyslog2:eth0 dead.
heartbeat[7797]: 2014/06/19_09:22:14 WARN: node vsyslog2: is dead
heartbeat[7797]: 2014/06/19_09:22:14 info: Dead node vsyslog2 gave up resources.
heartbeat[7797]: 2014/06/19_09:22:23 CRIT: Cluster node vsyslog2 returning after partition.
heartbeat[7797]: 2014/06/19_09:22:23 info: For information on cluster partitions, See URL:
http://linux-ha.org/SplitBrain
heartbeat[7797]: 2014/06/19_09:22:23 WARN: Deadtime value may be too small.
heartbeat[7797]: 2014/06/19_09:22:23 info: See FAQ for information on tuning deadtime.
heartbeat[7797]: 2014/06/19_09:22:23 info: URL:
http://linux-ha.org/FAQ#heavy_load
恢复2号机网络后,服务重启了
heartbeat[7797]: 2014/06/19_09:22:23 info: Link vsyslog2:eth0 up.
heartbeat[7797]: 2014/06/19_09:22:23 WARN: Late heartbeat: Node vsyslog2: interval 29010 ms
heartbeat[7797]: 2014/06/19_09:22:23 info: Status update for node vsyslog2: status active
heartbeat[8920]: 2014/06/19_09:22:23 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[8920]: 2014/06/19_09:22:23 info: Running /usr/local/etc/ha.d/rc.d/status status
heartbeat[7797]: 2014/06/19_09:22:25 info: Received shutdown notice from 'vsyslog2'.
heartbeat[7797]: 2014/06/19_09:22:25 info: Resources being acquired from vsyslog2.
heartbeat[7797]: 2014/06/19_09:22:25 WARN: Shutdown delayed until current resource activity finishes.
heartbeat[8936]: 2014/06/19_09:22:25 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[8936]: 2014/06/19_09:22:25 info: Running /usr/local/etc/ha.d/rc.d/status status
mach_down[8955]: 2014/06/19_09:22:25 info: Taking over resource group 172.16.5.153/24/eth0
ResourceManager[9004]: 2014/06/19_09:22:25 info: Acquiring resource group: vsyslog2 172.16.5.153/24/eth0
IPaddr[9026]: 2014/06/19_09:22:26 INFO: Running OK
heartbeat[8937]: 2014/06/19_09:22:26 info: Local Resource acquisition completed.
heartbeat[7797]: 2014/06/19_09:22:26 debug: StartNextRemoteRscReq(): child count 1
IPaddr[9066]: 2014/06/19_09:22:26 INFO: Running OK
mach_down[8955]: 2014/06/19_09:22:26 info: /usr/local/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[8955]: 2014/06/19_09:22:26 info: mach_down takeover complete for node vsyslog2.
heartbeat[7797]: 2014/06/19_09:22:26 info: mach_down takeover complete.
heartbeat[7797]: 2014/06/19_09:22:26 info: Heartbeat shutdown in progress. (7797)
heartbeat[9162]: 2014/06/19_09:22:26 info: Giving up all HA resources.
ResourceManager[9175]: 2014/06/19_09:22:26 info: Releasing resource group: vsyslog1 172.16.5.152/24/eth0
ResourceManager[9175]: 2014/06/19_09:22:26 info: Running /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.152/24/eth0 stop
ResourceManager[9175]: 2014/06/19_09:22:26 debug: Starting /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.152/24/eth0 stop
In IP Stop
SIOCDELRT: No such process
IPaddr[9244]: 2014/06/19_09:22:26 INFO: ifconfig eth0:1 down
IPaddr[9215]: 2014/06/19_09:22:26 INFO: Success
INFO: Success
ResourceManager[9175]: 2014/06/19_09:22:26 debug: /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.152/24/eth0 stop done. RC=0
ResourceManager[9274]: 2014/06/19_09:22:26 info: Releasing resource group: vsyslog2 172.16.5.153/24/eth0
ResourceManager[9274]: 2014/06/19_09:22:26 info: Running /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 stop
ResourceManager[9274]: 2014/06/19_09:22:26 debug: Starting /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 stop
In IP Stop
SIOCDELRT: No such process
IPaddr[9343]: 2014/06/19_09:22:26 INFO: ifconfig eth0:0 down
IPaddr[9314]: 2014/06/19_09:22:26 INFO: Success
INFO: Success
ResourceManager[9274]: 2014/06/19_09:22:26 debug: /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 stop done. RC=0
heartbeat[9162]: 2014/06/19_09:22:26 info: All HA resources relinquished.
heartbeat[7797]: 2014/06/19_09:22:28 info: killing HBFIFO process 7801 with signal 15
heartbeat[7797]: 2014/06/19_09:22:28 info: killing HBWRITE process 7802 with signal 15
heartbeat[7797]: 2014/06/19_09:22:28 info: killing HBREAD process 7803 with signal 15
heartbeat[7797]: 2014/06/19_09:22:28 info: killing HBREAD process 7805 with signal 15
heartbeat[7797]: 2014/06/19_09:22:28 info: killing HBWRITE process 7804 with signal 15
heartbeat[7797]: 2014/06/19_09:22:28 info: Core process 7802 exited. 5 remaining
heartbeat[7797]: 2014/06/19_09:22:28 info: Core process 7805 exited. 4 remaining
heartbeat[7797]: 2014/06/19_09:22:28 info: Core process 7803 exited. 3 remaining
heartbeat[7797]: 2014/06/19_09:22:28 info: Core process 7804 exited. 2 remaining
heartbeat[7797]: 2014/06/19_09:22:28 info: Core process 7801 exited. 1 remaining
heartbeat[7797]: 2014/06/19_09:22:28 info: vsyslog1 Heartbeat shutdown complete.
heartbeat[7797]: 2014/06/19_09:22:28 info: Heartbeat restart triggered.
heartbeat[7797]: 2014/06/19_09:22:28 info: Restarting heartbeat.
heartbeat[7797]: 2014/06/19_09:22:28 info: Performing heartbeat restart exec.
heartbeat[7797]: 2014/06/19_09:22:49 info: Version 2 support: false
heartbeat[7797]: 2014/06/19_09:22:49 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[7797]: 2014/06/19_09:22:49 info: **************************
heartbeat[7797]: 2014/06/19_09:22:49 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat[9374]: 2014/06/19_09:22:49 info: heartbeat: version 2.1.4
heartbeat[9374]: 2014/06/19_09:22:49 info: Heartbeat generation: 1402891661
heartbeat[9374]: 2014/06/19_09:22:49 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
heartbeat[9374]: 2014/06/19_09:22:49 info: glib: ucast: bound send socket to device: eth0
heartbeat[9374]: 2014/06/19_09:22:49 info: glib: ucast: bound receive socket to device: eth0
heartbeat[9374]: 2014/06/19_09:22:49 info: glib: ucast: started on port 694 interface eth0 to 172.16.5.243
heartbeat[9374]: 2014/06/19_09:22:49 info: glib: ping heartbeat started.
heartbeat[9374]: 2014/06/19_09:22:49 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[9374]: 2014/06/19_09:22:49 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[9374]: 2014/06/19_09:22:49 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[9374]: 2014/06/19_09:22:49 info: Local status now set to: 'up'
heartbeat[9374]: 2014/06/19_09:22:49 info: Link 172.16.5.30:172.16.5.30 up.
heartbeat[9374]: 2014/06/19_09:22:49 info: Status update for node 172.16.5.30: status ping
heartbeat[9374]: 2014/06/19_09:22:49 info: Link vsyslog2:eth0 up.
heartbeat[9374]: 2014/06/19_09:22:49 debug: get_delnodelist: delnodelist=
heartbeat[9374]: 2014/06/19_09:22:50 info: Status update for node vsyslog2: status active
heartbeat[9374]: 2014/06/19_09:22:50 info: Comm_now_up(): updating status to active
heartbeat[9374]: 2014/06/19_09:22:50 info: Local status now set to: 'active'
heartbeat[9384]: 2014/06/19_09:22:50 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[9384]: 2014/06/19_09:22:50 info: Running /usr/local/etc/ha.d/rc.d/status status
heartbeat[9374]: 2014/06/19_09:23:00 info: local resource transition completed.
heartbeat[9374]: 2014/06/19_09:23:00 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[9438]: 2014/06/19_09:23:00 INFO: Resource is stopped
heartbeat[9402]: 2014/06/19_09:23:00 info: Local Resource acquisition completed.
heartbeat[9374]: 2014/06/19_09:23:00 debug: StartNextRemoteRscReq(): child count 1
heartbeat[9489]: 2014/06/19_09:23:00 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[9489]: 2014/06/19_09:23:00 info: Running /usr/local/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[9489]: 2014/06/19_09:23:00 received ip-request-resp 172.16.5.152/24/eth0 OK yes
ResourceManager[9510]: 2014/06/19_09:23:00 info: Acquiring resource group: vsyslog1 172.16.5.152/24/eth0
IPaddr[9537]: 2014/06/19_09:23:00 INFO: Resource is stopped
ResourceManager[9510]: 2014/06/19_09:23:00 info: Running /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.152/24/eth0 start
ResourceManager[9510]: 2014/06/19_09:23:00 debug: Starting /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.152/24/eth0 start
IPaddr[9637]: 2014/06/19_09:23:00 INFO: Using calculated netmask for 172.16.5.152: 255.255.255.0
IPaddr[9637]: 2014/06/19_09:23:00 DEBUG: Using calculated broadcast for 172.16.5.152: 172.16.5.255
IPaddr[9637]: 2014/06/19_09:23:00 INFO: eval ifconfig eth0:0 172.16.5.152 netmask 255.255.255.0 broadcast 172.16.5.255
IPaddr[9637]: 2014/06/19_09:23:00 DEBUG: Sending Gratuitous Arp for 172.16.5.152 on eth0:0 [eth0]
IPaddr[9608]: 2014/06/19_09:23:00 INFO: Success
INFO: Success
ResourceManager[9510]: 2014/06/19_09:23:00 debug: /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.152/24/eth0 start done. RC=0
heartbeat[9374]: 2014/06/19_09:23:00 info: remote resource transition completed.
heartbeat[9374]: 2014/06/19_09:26:24 info: Received shutdown notice from 'vsyslog2'.
heartbeat[9374]: 2014/06/19_09:26:24 info: Resources being acquired from vsyslog2.
heartbeat[9374]: 2014/06/19_09:26:24 debug: StartNextRemoteRscReq(): child count 1
heartbeat[9740]: 2014/06/19_09:26:24 info: acquire local HA resources (standby).
ResourceManager[9766]: 2014/06/19_09:26:24 info: Acquiring resource group: vsyslog1 172.16.5.152/24/eth0
IPaddr[9817]: 2014/06/19_09:26:24 INFO: Running OK
IPaddr[9816]: 2014/06/19_09:26:24 INFO: Running OK
heartbeat[9740]: 2014/06/19_09:26:24 info: local HA resource acquisition completed (standby).
heartbeat[9741]: 2014/06/19_09:26:24 info: Local Resource acquisition completed.
heartbeat[9374]: 2014/06/19_09:26:24 info: Standby resource acquisition done [all].
heartbeat[9374]: 2014/06/19_09:26:24 debug: StartNextRemoteRscReq(): child count 1
heartbeat[9924]: 2014/06/19_09:26:24 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[9924]: 2014/06/19_09:26:24 info: Running /usr/local/etc/ha.d/rc.d/status status
mach_down[9940]: 2014/06/19_09:26:24 info: Taking over resource group 172.16.5.153/24/eth0
ResourceManager[9966]: 2014/06/19_09:26:24 info: Acquiring resource group: vsyslog2 172.16.5.153/24/eth0
IPaddr[9993]: 2014/06/19_09:26:24 INFO: Resource is stopped
ResourceManager[9966]: 2014/06/19_09:26:24 info: Running /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 start
ResourceManager[9966]: 2014/06/19_09:26:24 debug: Starting /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 start
IPaddr[10093]: 2014/06/19_09:26:25 INFO: Using calculated netmask for 172.16.5.153: 255.255.255.0
IPaddr[10093]: 2014/06/19_09:26:25 DEBUG: Using calculated broadcast for 172.16.5.153: 172.16.5.255
IPaddr[10093]: 2014/06/19_09:26:25 INFO: eval ifconfig eth0:1 172.16.5.153 netmask 255.255.255.0 broadcast 172.16.5.255
IPaddr[10093]: 2014/06/19_09:26:25 DEBUG: Sending Gratuitous Arp for 172.16.5.153 on eth0:1 [eth0]
IPaddr[10064]: 2014/06/19_09:26:25 INFO: Success
INFO: Success
ResourceManager[9966]: 2014/06/19_09:26:25 debug: /usr/local/etc/ha.d/resource.d/IPaddr 172.16.5.153/24/eth0 start done. RC=0
mach_down[9940]: 2014/06/19_09:26:25 info: /usr/local/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[9940]: 2014/06/19_09:26:25 info: mach_down takeover complete for node vsyslog2.
heartbeat[9374]: 2014/06/19_09:26:25 info: mach_down takeover complete.
heartbeat[9374]: 2014/06/19_09:26:31 info: Link vsyslog2:eth0 dead.
heartbeat[9374]: 2014/06/19_09:26:45 WARN: node vsyslog2: is dead
heartbeat[9374]: 2014/06/19_09:26:45 info: Dead node vsyslog2 gave up resources.
heartbeat[9374]: 2014/06/19_09:26:49 info: Heartbeat restart on node vsyslog2
heartbeat[9374]: 2014/06/19_09:26:49 info: Link vsyslog2:eth0 up.
heartbeat[9374]: 2014/06/19_09:26:49 info: Status update for node vsyslog2: status init
heartbeat[9374]: 2014/06/19_09:26:49 info: Status update for node vsyslog2: status up
heartbeat[9374]: 2014/06/19_09:26:49 debug: StartNextRemoteRscReq(): child count 1
heartbeat[10202]: 2014/06/19_09:26:49 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[10202]: 2014/06/19_09:26:49 info: Running /usr/local/etc/ha.d/rc.d/status status
heartbeat[10218]: 2014/06/19_09:26:49 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[10218]: 2014/06/19_09:26:49 info: Running /usr/local/etc/ha.d/rc.d/status status
heartbeat[9374]: 2014/06/19_09:26:49 debug: get_delnodelist: delnodelist=
heartbeat[9374]: 2014/06/19_09:26:50 info: Status update for node vsyslog2: status active
heartbeat[10234]: 2014/06/19_09:26:50 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[10234]: 2014/06/19_09:26:50 info: Running /usr/local/etc/ha.d/rc.d/status status
heartbeat[9374]: 2014/06/19_09:26:50 info: remote resource transition completed.
作者:
q1208c
时间:
2014-06-19 11:27
1.x时代的配置 加上 2.x 时代的产品, 用在 3.x 时代.
楼主, 有点过时呀.
作者:
zhjixi1234
时间:
2014-06-19 11:44
回复
2#
q1208c
3.X太复杂,我还没搞明白。只做个2节点的HA,2.X配置能简单点
你所说的2代的配置应该是什么样的啊 ?
作者:
q1208c
时间:
2014-06-19 12:39
回复
3#
zhjixi1234
2.x的 配置 和 3.x 是一样的. 你现在用的, 是 1.x 时代的配置.
2.1.4 应该是 2.x 最后一个版本了.
如果使用 crm 好象是不会回来的, 或者可以控制是不是回来.
而且, 还有很多精确的控制.
如果你不想手工去写, 我记得 2.x 带一个 图形的配置管理器. 我用过, 只要不是很特别的配置, 都可以支持的.
作者:
kivis
时间:
2014-06-27 10:42
本帖最后由 kivis 于 2014-06-27 10:49 编辑
从我最近实验结果来看,你这是发生脑裂了,我也遇到了同样的问题。最终又返回研究了下HA的原理,发现问题所在,
从你的ha.cf看,你的心跳配置为 :
ucast eth0 172.16.5.243
这种情况下,你断开公共网络,主备机无法接收到对方的心跳信息,两边无法知道对方的情况,这种,备机会启动资源,
但是你恢复主机网线的时候,互相通信后 发生脑裂,
根据HA的原理:发生脑裂后,主备机HA服务会重启,然后主机接管资源。
你可以装一条单独的心跳线 网口eth1,使用如下配置
#ucast eth0 172.16.5.243
bcast eth1
心跳线一般不允许断开,
然后启用ipfail工具,利用ipfail工具和你配置中的ping工具配合检测公共网络故障,作为主备切换的条件,心跳线作为HA服务器间互相通信,一般不断开。
这样,主机公共网络故障后,自动释放资源。通知备机启用资源,网络恢复后,根据auto_failback配置,决定资源是否切回主机。
参考下面的博文,关于HA故障切换测试应有的反应:
http://ixdba.blog.51cto.com/2895551/747510
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2