- 论坛徽章:
- 0
|
内核:2.6.18-194.el5
OS:Red Hat Enterprise Linux Server release 5.5 (Tikanga)
2台IBM x3650 m2做RHCS,用x3650 m2服务器的ipmi做fence设备
网络设置:
节点1:
public IP 10.72.86.121
private IP 10.1.1.1
ipmi IP 10.72.86.126
节点2:
public IP 10.72.86.122
private IP 10.1.1.2
ipmi IP 10.72.86.127
心跳private IP网口用直连网线连接起来,public IP网口和ipmiIP网口接到同一个交换机
现象1:在节点1上用fence_ipmilan 10.72.86.127命令可以fence节点2,让节点2重启了:
[root@elndb1 /]# fence_ipmilan -a 10.72.86.127
Rebooting machine @ IPMI:10.72.86.127...Done
但是messages日志会报如下错误信息:
Jun 28 19:47:17 elndb1 fenced[6977]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.72.86.127...Failed
Jun 28 19:47:17 elndb1 fenced[6977]: fence "elndb2.eln.com" failed
现象2:在节点1上用fence_node elndb2.eln.com命令fence节点2失败:
[root@elndb1 ~]# fence_node elndb2.eln.com
agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.72.86.127...Failed
messages日志报如下错误信息:
Jun 28 20:43:11 elndb1 fence_node[8830]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.72.86.127...Failed
Jun 28 20:43:11 elndb1 fence_node[8830]: Fence of "elndb2.eln.com" was unsuccessful
cluster.conf文件配置如下:
<?xml version="1.0"?>
<cluster alias="elndb_cluster" config_version="9" name="elndb_cluster">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="elndb1.eln.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="elndb2.eln.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="none" ipaddr="10.72.86.126" login="USERID" name="ipmi1" passwd="PASSW0RD"/>
<fencedevice agent="fence_ipmilan" auth="none" ipaddr="10.72.86.127" login="USERID" name="ipmi2" passwd="PASSW0RD"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="elndbdomain" restricted="1">
<failoverdomainnode name="elndb1.eln.com" priority="1"/>
<failoverdomainnode name="elndb2.eln.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.72.86.130" monitor_link="1"/>
</resources>
<service autostart="0" domain="elndbdomain" name="elndb_svc" recovery="relocate">
<ip ref="10.72.86.130"/>
</service>
</rm>
</cluster>
从messages日志看到的信息非常有限,请问一下大家该如何处理? |
|