免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4789 | 回复: 7
打印 上一主题 下一主题

rhcs启cman服务时,貌似节点之间相互感应不到对方 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2012-03-17 10:52 |只看该作者 |倒序浏览
[root@GDAPQE16 ~]# clustat
Cluster Status for gfscluster @ Fri Mar 16 15:36:01 2012
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
GDAPQE16 1 Online, Local
GDAPQE15 2 Offline
[root@GDAPQE15 ~]# clustat
Cluster Status for gfscluster @ Fri Mar 16 15:36:10 2012
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
GDAPQE16 1 Offline
GDAPQE15 2 Online, Local

配置文件,两节点的配置文件一样
<?xml version="1.0"?>
<cluster alias="gfscluster" config_version="3" name="gfscluster">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="GDAPQE16" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="manualfence" nodename="GDAPQE16"/>
</method>
</fence>
</clusternode>
<clusternode name="GDAPQE15" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="manualfence" nodename="GDAPQE15"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="manualfence"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>



log:
Mar 16 14:08:01 GDAPQE15 kernel: DLM (built Mar 16 2010 21:53:04) installed
Mar 16 14:08:01 GDAPQE15 kernel: GFS2 (built Mar 16 2010 21:53:24) installed
Mar 16 14:08:01 GDAPQE15 kernel: Lock_DLM (built Mar 16 2010 21:53:29) installed
Mar 16 14:08:01 GDAPQE15 ccsd[17469]: Starting ccsd 2.0.115:
Mar 16 14:08:01 GDAPQE15 ccsd[17469]: Built: Mar 16 2010 10:28:57
Mar 16 14:08:01 GDAPQE15 ccsd[17469]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Mar 16 14:08:01 GDAPQE15 ccsd[17469]: cluster.conf (cluster name = gfscluster, version = 1) found.
Mar 16 14:08:01 GDAPQE15 ccsd[17469]: Unable to sendto broadcast ipv6 socket, but inet_ntop returned NULL pointer: Cannot assign requested address
Mar 16 14:08:29 GDAPQE15 last message repeated 16 times
Mar 16 14:08:30 GDAPQE15 ccsd[17469]: Unable to connect to cluster infrastructure after 30 seconds.
Mar 16 14:08:31 GDAPQE15 ccsd[17469]: Unable to sendto broadcast ipv6 socket, but inet_ntop returned NULL pointer: Cannot assign requested address
Mar 16 14:08:41 GDAPQE15 last message repeated 6 times
Mar 16 14:08:41 GDAPQE15 openais[17483]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [MAIN ] AIS Executive Service: started and ready to provide service.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [MAIN ] Using default multicast address of 239.192.161.86
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] join (60 ms) send_join (0 ms) consensus (20000 ms) merge (200 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1402
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] send threads (0 threads)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] RRP token expired timeout (495 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] RRP token problem counter (2000 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] RRP threshold (10 problem count)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] RRP mode set to none.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] heartbeat_failures_allowed (0)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] max_network_delay (50 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] The network interface [56.0.186.47] is now up.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Created or loaded sequence id 52.56.0.186.47 for this ring.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] entering GATHER state from 15.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CMAN ] CMAN 2.0.115 (built Mar 16 2010 10:29:01) started
Mar 16 14:08:41 GDAPQE15 openais[17483]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais event service B.01.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais message service B.01.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais configuration service'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SYNC ] Not using a virtual synchrony filter.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Creating commit token because I am the rep.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Saving state aru 0 high seq received 0
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Storing new sequence id for ring 38
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] entering COMMIT state.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] entering RECOVERY state.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] position [0] member 56.0.186.47:
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] previous ring seq 52 rep 56.0.186.47
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] aru 0 high delivered 0 received flag 1
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Did not need to originate any messages in recovery.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Sending initial ORF token
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] CLM CONFIGURATION CHANGE
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] New Configuration:
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] Members Left:
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] Members Joined:
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] CLM CONFIGURATION CHANGE
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] New Configuration:
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] r(0) ip(56.0.186.47)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] Members Left:
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] Members Joined:
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] r(0) ip(56.0.186.47)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [SYNC ] This node is within the primary component and will provide service.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] entering OPERATIONAL state.
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CMAN ] quorum regained, resuming activity
Mar 16 14:08:41 GDAPQE15 openais[17483]: [CLM ] got nodejoin message 56.0.186.47
Mar 16 14:08:42 GDAPQE15 ccsd[17469]: Initial status:: Quorate
Mar 16 14:09:31 GDAPQE15 fenced[17536]: GDAPQE16 not a cluster member after 3 sec post_join_delay
Mar 16 14:09:31 GDAPQE15 fenced[17536]: fencing node "GDAPQE16"
Mar 16 14:09:31 GDAPQE15 fenced[17536]: fence "GDAPQE16" failed
Mar 16 14:09:36 GDAPQE15 fenced[17536]: fencing node "GDAPQE16"
Mar 16 14:09:36 GDAPQE15 fenced[17536]: fence "GDAPQE16" failed
Mar 16 14:09:41 GDAPQE15 fenced[17536]: fencing node "GDAPQE16"
Mar 16 14:09:41 GDAPQE15 fenced[17536]: fence "GDAPQE16" failed
Mar 16 14:09:46 GDAPQE15 fenced[17536]: fencing node "GDAPQE16"
Mar 16 14:09:46 GDAPQE15 fenced[17536]: fence "GDAPQE16" failed
Mar 16 14:09:51 GDAPQE15 fenced[17536]: fencing node "GDAPQE16"
Mar 16 14:09:51 GDAPQE15 fenced[17536]: fence "GDAPQE16" failed
Mar 16 14:09:56 GDAPQE15 fenced[17536]: fencing node "GDAPQE16"
Mar 16 14:09:56 GDAPQE15 fenced[17536]: fence "GDAPQE16" failed
然后就是一直fence对方节点了

论坛徽章:
0
2 [报告]
发表于 2012-05-03 22:41 |只看该作者
楼主的问题解决了没啊 ?最近我也遇到了个类似的问题,启动2个节点的集群,出现了不断相互fence对方的情况,这是怎么回事啊?

论坛徽章:
4
天秤座
日期:2015-01-09 16:08:43狮子座
日期:2015-01-10 12:54:442015年亚洲杯之卡塔尔
日期:2015-01-29 23:02:232015亚冠之卡尔希纳萨夫
日期:2015-10-17 10:41:11
3 [报告]
发表于 2012-05-05 06:39 |只看该作者
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] join (60 ms) send_join (0 ms) consensus (20000 ms) merge (200 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1402
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] send threads (0 threads)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] RRP token expired timeout (495 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] RRP token problem counter (2000 ms)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] RRP threshold (10 problem count)
Mar 16 14:08:41 GDAPQE15 openais[17483]: [TOTEM] RRP mode set to none.

网络环境和拓扑描述一下?

论坛徽章:
0
4 [报告]
发表于 2012-05-05 19:03 |只看该作者
本帖最后由 xwzh2009 于 2012-05-05 19:05 编辑

我的网络环境是这样的:创建的2个节点集群,它们通过一个三层交换机建立心跳,带fence设备。
两个节点同时起时集群能启动成功,但是将其中一个节点异常重启后,就再也加不进集群了,然后就把另外一个fence掉了,等这个节点起来后,它也加入不了集群,因此又把第一个节点fence掉了,就这样不断相互fence对方。

现在发现了一个新的情况,我把集群的fence设备去掉,使集群不带fence,这时我重启其中一个节点,刚开始不能加入集群,这时由于没有fence设备,执行fence失败,过了大概4分半钟的时间,又能重新发现另外一个节点,使用tcpdump抓包看了下,在这4分半的时间内,节点在不断发送组播消息,随后进行了arp请求,并发现心跳。尝试重启2次,都是这个现象。现在我比较迷惑的是,这4分半到底是什么时间,集群做了些啥?不知道是交换机的原因还是集群本身的原因?

论坛徽章:
4
天秤座
日期:2015-01-09 16:08:43狮子座
日期:2015-01-10 12:54:442015年亚洲杯之卡塔尔
日期:2015-01-29 23:02:232015亚冠之卡尔希纳萨夫
日期:2015-10-17 10:41:11
5 [报告]
发表于 2012-05-06 05:12 |只看该作者
在这4分半的时间内,节点在不断发送组播消息


那个交换机对组播的支持如何?有没有做相关的设置呢?找根交叉网线,把心跳网卡直连最保险。

论坛徽章:
0
6 [报告]
发表于 2012-05-07 21:54 |只看该作者
sleepcat 发表于 2012-05-06 05:12
那个交换机对组播的支持如何?有没有做相关的设置呢?找根交叉网线,把心跳网卡直连最保险。


交换机的设置我不太清楚,都是默认的设置,你说的交换机对组播的支持是指什么啊?难道交换机还需要特殊的配置吗?
另外,交换机的心跳必须通过交换机,便于以后扩展。
谢谢。

论坛徽章:
4
天秤座
日期:2015-01-09 16:08:43狮子座
日期:2015-01-10 12:54:442015年亚洲杯之卡塔尔
日期:2015-01-29 23:02:232015亚冠之卡尔希纳萨夫
日期:2015-10-17 10:41:11
7 [报告]
发表于 2012-05-07 22:12 |只看该作者
xwzh2009 发表于 2012-05-07 21:54
交换机的设置我不太清楚,都是默认的设置,你说的交换机对组播的支持是指什么啊?难道交换机还需要特殊 ...


某些交换机缺省不允许组播。我对网络也不是很熟,别人这么告诉我的。


论坛徽章:
0
8 [报告]
发表于 2012-05-07 23:20 |只看该作者
sleepcat 发表于 2012-05-07 22:12
某些交换机缺省不允许组播。我对网络也不是很熟,别人这么告诉我的。


但过了4分多钟的时间,节点间又能发现心跳了, 这应该不能说明交换机不允许组播,不然,节点间根本就建立不了心跳。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP