免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4659 | 回复: 2

求助:RHEL HA节点2运行几分钟后就被节点1 fence重启 [复制链接]

论坛徽章:
0
发表于 2015-07-30 16:39 |显示全部楼层
fence使用的是fence_ipmilan设备,两台机器的心跳网卡和fence设备在同一个网络中。

以下是/etc/hosts的内容:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

########### product IP ##########
192.168.10.20   vip
192.168.10.21   db1
192.168.10.22   db2

########### heartbeat IP of cluster #############
192.168.20.1    node1
192.168.20.2    node2

########### iLO IP of nodes ###########
192.168.20.3    node1ilo
192.168.20.4    node2ilo

节点1的Message日志:
Jul 14 22:48:27 db1 corosync[4336]:   [TOTEM ] A processor failed, forming new configuration.
Jul 14 22:48:29 db1 corosync[4336]:   [QUORUM] Members[1]: 1
Jul 14 22:48:29 db1 corosync[4336]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jul 14 22:48:29 db1 kernel: dlm: closing connection to node 2
Jul 14 22:48:29 db1 corosync[4336]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.20.1) ; members(old:2 left:1)
Jul 14 22:48:29 db1 corosync[4336]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 14 22:48:29 db1 fenced[4387]: fence node2 success

以下是节点1上的fenced.log部分内容:
Jul 14 21:46:28 fenced fencing node node2
Jul 14 21:46:43 fenced fence node2 success
Jul 14 21:54:35 fenced fencing node node2
Jul 14 21:54:51 fenced fence node2 success
Jul 14 22:02:41 fenced fencing node node2
Jul 14 22:02:57 fenced fence node2 success
Jul 14 22:45:40 fenced fenced 3.0.12.1 started
Jul 14 22:48:06 fenced fencing node node2
Jul 14 22:48:07 fenced fence node2 dev 0.0 agent fence_ipmilan result: error from agent
Jul 14 22:48:07 fenced fence node2 failed
Jul 14 22:48:10 fenced fencing node node2
Jul 14 22:48:29 fenced fence node2 success
Jul 14 22:48:29 fenced telling cman to remove nodeid 2 from cluster
Jul 14 22:55:06 fenced fencing node node2
Jul 14 22:55:22 fenced fence node2 success
Jul 14 23:02:02 fenced fencing node node2
Jul 14 23:02:18 fenced fence node2 success
Jul 14 23:08:59 fenced fencing node node2

我现在怀疑两个方面:
1是心跳网络关闭了广播
2是两个节点的系统时间不一致

请问第一点如何测试一下,是用 ping xxx.xxx.xxx.255命令吗?
第二点会导致fencing吗?

论坛徽章:
0
发表于 2015-07-31 12:16 |显示全部楼层
本帖最后由 hp_star 于 2015-08-01 17:12 编辑

抱歉问题还是存在,再描述一下问题:

两台机器的cman和rgmanager 服务都是开机自动启动。
没有配quorum disk,fence使用的fence_ipmi

在集群正常运行时直接执行shutdown -ry 0重启,服务可以切换到另一个节点,但是等这个节点启动后过几分钟就会被正常运行的节点fence,所以它就一直在重启》被fence》重启的循环。

不过发现重启前先将cman服务和rgmanager都停掉,或执行ccs -h host --stop,再重启,系统起来后就不会被fence。

以下是我测试multicast address的结果,我好像是网络的问题:

[root@node2 ~]# tcpdump -i eth2 host 239.192.145.113
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
16:56:23.356670 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:56:34.784205 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:56:46.211659 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:56:57.639938 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:57:22.270343 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
16:57:22.270936 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 318
16:57:22.271781 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 414
16:57:22.273190 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 414
16:57:22.291402 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
16:57:22.293069 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 446
16:57:22.294785 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 550
16:57:22.295187 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 414
16:57:22.304720 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 550
16:57:22.305397 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 454
16:57:22.515179 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:57:28.722255 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
16:57:28.722681 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 171
16:57:28.774519 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 75
16:57:28.774794 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
16:57:28.775175 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
16:57:28.775665 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 313
16:57:28.776023 IP node2.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 467
16:57:28.776521 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 201
16:57:28.987090 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:57:40.419938 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:57:51.847535 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:58:03.275718 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:58:14.703557 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:58:26.131173 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:58:37.558770 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:58:48.986406 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119

16:59:00.414809 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:59:11.842508 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:59:23.270324 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:59:34.698492 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:59:46.126171 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
16:59:57.553608 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:00:08.980664 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:00:20.411082 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:00:31.844305 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:00:43.277512 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:00:54.711376 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:01:06.144642 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:01:17.577841 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:01:29.010987 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:01:40.444306 IP node1.hpoms-dps-lstn > 239.192.145.113.netsupport: UDP, length 119
17:01:51.877522 IP node1.hpoms-dp

论坛徽章:
0
发表于 2015-08-03 15:49 |显示全部楼层
问题已经解决。

是rc.local在搞鬼。
[root@xsl-rms-database1 ~]# cat /etc/rc.local
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

/etc/init.d/network restart
touch /var/lock/subsys/local

开机cman启动后,rc.local重启所有网卡,造成心跳断了一段时间(打开20-30秒),当然会发生fence了。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP