IP监控

你的RESOURCE里监控的IP地址都是那几个，我的配置是只监控SERVICE IP和2台机器ETH1的IP，
把你的cluster.conf贴出来看看

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

redkops

白手起家

论坛徽章:: 0

12楼 [报告]

发表于 2007-03-19 17:02 |只看该作者

在我的resource中，只配置了一个IP资源，就是Service IP，172.16.20.1。

同时在/etc/hosts文件中，将eth0、eth1的地址都加入了，指向各自的主机名。

cluster.conf文件内容：

<?xml version="1.0"?>
<cluster config_version="32" name="clu">
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="20"/>
<clusternodes>
<clusternode name="db-1" votes="1">
<fence>
<method name="1">
<device name="BMC"/>
</method>
</fence>
</clusternode>
<clusternode name="db-2" votes="1">
<fence>
<method name="1">
<device name="BMC"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="192.168.20.11" login="USERID" name="BMC" passwd="PASSW0RD"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="db" ordered="0" restricted="0">
<failoverdomainnode name="db-1" priority="1"/>
<failoverdomainnode name="db-2" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="172.16.20.1" monitor_link="0"/>
<fs device="/dev/sdb1" force_unmount="1" fstype="ext3" mountpoint="/home/oracle/oradata/SID" name="oradata" options=""/>
<script file="/etc/rc.d/init.d/oracle" name="oracle.sh"/>
</resources>
<service autostart="1" domain="db" name="oracle">
<ip ref="172.16.20.1">
<fs ref="oradata">
<script ref="oracle.sh"/>
</fs>
</ip>
</service>
</rm>
</cluster>

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

redkops

白手起家

论坛徽章:: 0

13楼 [报告]

发表于 2007-03-21 11:20 |只看该作者

找了很久，在一个网站（http://sources.redhat.com/cluster/faq.html ）上找到了关于2节点的cluster中，节点间失去联系后的情况说明：

“When each node recognizes that the other has stopped responding, it will try to fence the other. It can be like a gunfight at the O.K. Coral, and the node that's quickest on the draw (first to fence the other) wins. Unfortunately, both nodes can end up going down simultaneously, losing the whole cluster.
It's possible to avoid this by using a network power switch that serializes the two fencing operations. That ensures that one node is rebooted and the second never fences the first. ”

“Strangely, if you have a persistent network problem and the fencing device is still accessible to both nodes, this can result in a "A reboots B, B reboots A" fencing loop.
This problem can be gotten around by using a quorum disk or partition to break the tie.”

谢谢，nntp版主！

现在我的情况是没有Power switch，版本也是RHCS U2。

准备去试试，先将心跳信息用另外一个网口－eth0，IPMI LAN传递（http://sources.redhat.com/cluster/faq.html#cman_heartbeat_nic），就是不知道作为fence设备的IPMI LAN是否可行？

如果还不行的话，就升级RHCS到U4，配置quorum disk 再试试。

丫丫的，RHCS U2真是烂！
也怪自己没有了解充分，早点用U4、用quorum disk 可能会省去很多麻烦。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

mrjoeyu

白手起家

论坛徽章:: 0

14楼 [报告]

发表于 2007-03-23 13:45 |只看该作者

This issue is fixed in RHEL4U4. You should qdisk function to solve this problem. You can "man qdiskd" for further details.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

12 / 2 页

返回列表

Chinaunix › 论坛 › IT运维 › 集群和高可用 › RHCS4作集群后，其他切换正常，但是拔网线后出现问题， ...

RHCS4作集群后，其他切换正常，但是拔网线后出现问题，请教各位大侠。 [复制链接]

IP监控