免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2335 | 回复: 6
打印 上一主题 下一主题

请教RHCS的问题(REDHAT Enterprise 5) [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-06-23 10:56 |只看该作者 |倒序浏览
我在我在其中一台机器(LINUX1)上起service cman start,在这之前两台机器的cman都是没起的
日志刷出以下内容:
Jun 23 18:49:13 LINUX1 openais[29836]: [CMAN ] CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
Jun 23 18:49:13 LINUX1 openais[29836]: [SYNC ] Not using a virtual synchrony filter.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Creating commit token because I am the rep.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Saving state aru 0 high seq received 0
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] entering COMMIT state.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] entering RECOVERY state.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] position [0] member 192.168.10.31:
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] previous ring seq 0 rep 192.168.10.31
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] aru 0 high delivered 0 received flag 0
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Did not need to originate any messages in recovery.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Storing new sequence id for ring 4
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] Sending initial ORF token
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] New Configuration:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] Members Left:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] Members Joined:
Jun 23 18:49:13 LINUX1 openais[29836]: [SYNC ] This node is within the primary component and will provide service.
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] New Configuration:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ]       r(0) ip(192.168.10.31)  
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] Members Left:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] Members Joined:
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ]       r(0) ip(192.168.10.31)  
Jun 23 18:49:13 LINUX1 openais[29836]: [SYNC ] This node is within the primary component and will provide service.
Jun 23 18:49:13 LINUX1 openais[29836]: [TOTEM] entering OPERATIONAL state.
Jun 23 18:49:13 LINUX1 openais[29836]: [CMAN ] quorum regained, resuming activity
Jun 23 18:49:13 LINUX1 openais[29836]: [CLM  ] got nodejoin message 192.168.10.31
Jun 23 18:49:13 LINUX1 ccsd[29830]: Initial status:: Quorate
Jun 23 18:49:19 LINUX1 fenced[29854]: LINUX2 not a cluster member after 3 sec post_join_delay
Jun 23 18:49:19 LINUX1 fenced[29854]: fencing node "LINUX2"
Jun 23 18:49:19 LINUX1 fence_manual: Node LINUX2 needs to be reset before recovery can procede.  Waiting for LINUX2 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n LINUX2)

然后我执行fence_ack_manual -n LINUX2
日志刷出:
Jun 23 18:50:43 LINUX1 fenced[29854]: fence "LINUX2" success
Jun 23 18:50:48 LINUX1 ccsd[29830]: Attempt to close an unopened CCS descriptor (180).
Jun 23 18:50:48 LINUX1 ccsd[29830]: Error while processing disconnect: Invalid request descriptor

我如果在另一台机器上起cman,也会刷出相同的日志,只是 LINUX2换成了LINUX1;

两台机器cman都起来后,两台机器的情况入下:
[root@LINUX1 cluster]# clustat -l
Member Status: Quorate
  Member Name                        ID   Status
  ------ ----                        ---- ------
  LINUX1                             1 Online, Local
  LINUX2                             2 Offline

[root@LINUX2 etc]# clustat -l
Member Status: Quorate
  Member Name                        ID   Status
  ------ ----                        ---- ------
  LINUX1                             1 Offline
  LINUX2                             2 Online, Local


请教各位高手,这是什么问题,如何解决?

[ 本帖最后由 txl829 于 2008-6-28 14:32 编辑 ]

论坛徽章:
0
2 [报告]
发表于 2008-06-23 13:31 |只看该作者
把你的结构拓扑,以及
配置文件/etc/cluster/cluster.conf,/etc/hosts,以及/var/log/message拿来看看。

我觉得,应该是配置文件错。

论坛徽章:
0
3 [报告]
发表于 2008-06-23 14:26 |只看该作者
[root@LINUX1 cluster]# cat cluster.conf
<?xml version="1.0" ?>
<cluster config_version="2" name="_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="LINUX1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="svr_ip" nodename="LINUX1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="LINUX2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="svr_ip" nodename="LINUX2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="svr_ip"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="" ordered="0" restricted="0">
                                <failoverdomainnode name="LINUX1" priority="1"/>
                                <failoverdomainnode name="LINUX2" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
                <service autostart="1" domain="" name="serv_ip" recovery="relocate">
                        <ip address="192.168.10.32" monitor_link="1"/>
                </service>
        </rm>
</cluster>


hosts
[root@LINUX1 etc]# cat hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost.localdomain localhost
::1     localhost6.localdomain6 localhost6
192.168.10.11   1   
192.168.10.13   2   
192.168.10.31   LINUX1
192.168.10.33   LINUX2
192.168.10.32   svr_ip



日志

Jun 23 22:24:42 LINUX2 ccsd[10879]: Starting ccsd 2.0.60:
Jun 23 22:24:42 LINUX2 ccsd[10879]:  Built: Jan 23 2007 12:42:25
Jun 23 22:24:42 LINUX2 ccsd[10879]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Jun 23 22:24:42 LINUX2 ccsd[10879]: cluster.conf (cluster name = _cluster, version = 2) found.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] AIS Executive Service RELEASE 'subrev 1324 version 0.80.2'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] AIS Executive Service: started and ready to provide service.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Using default multicast address of 239.192.88.13
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_cpg loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais cluster closed process group service v1.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_cfg loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais configuration service'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_msg loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais message service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_lck loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais distributed locking service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_evt loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais event service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_ckpt loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais checkpoint service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_amf loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais availability management framework B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_clm loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais cluster membership service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_evs loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais extended virtual synchrony service'
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] openais component openais_cman loaded.
Jun 23 22:24:45 LINUX2 openais[10885]: [MAIN ] Registering service handler 'openais CMAN membership service 2.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] send threads (0 threads)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] RRP token expired timeout (495 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] RRP token problem counter (2000 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] RRP threshold (10 problem count)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] RRP mode set to none.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] heartbeat_failures_allowed (0)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] max_network_delay (50 ms)
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] The network interface [192.168.10.33] is now up.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Created or loaded sequence id 0.192.168.10.33 for this ring.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] entering GATHER state from 15.
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais extended virtual synchrony service'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais availability management framework B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais event service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais message service B.01.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais configuration service'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01'
Jun 23 22:24:45 LINUX2 openais[10885]: [CMAN ] CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
Jun 23 22:24:45 LINUX2 openais[10885]: [SYNC ] Not using a virtual synchrony filter.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Creating commit token because I am the rep.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Saving state aru 0 high seq received 0
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] entering COMMIT state.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] entering RECOVERY state.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] position [0] member 192.168.10.33:
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] previous ring seq 0 rep 192.168.10.33
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] aru 0 high delivered 0 received flag 0
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Did not need to originate any messages in recovery.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Storing new sequence id for ring 4
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] Sending initial ORF token
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] New Configuration:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] Members Left:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] Members Joined:
Jun 23 22:24:45 LINUX2 openais[10885]: [SYNC ] This node is within the primary component and will provide service.
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] New Configuration:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ]       r(0) ip(192.168.10.33)  
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] Members Left:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] Members Joined:
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ]       r(0) ip(192.168.10.33)  
Jun 23 22:24:45 LINUX2 openais[10885]: [SYNC ] This node is within the primary component and will provide service.
Jun 23 22:24:45 LINUX2 openais[10885]: [TOTEM] entering OPERATIONAL state.
Jun 23 22:24:45 LINUX2 openais[10885]: [CMAN ] quorum regained, resuming activity
Jun 23 22:24:45 LINUX2 openais[10885]: [CLM  ] got nodejoin message 192.168.10.33
Jun 23 22:24:45 LINUX2 ccsd[10879]: Initial status:: Quorate
Jun 23 22:24:50 LINUX2 fenced[10901]: LINUX1 not a cluster member after 3 sec post_join_delay
Jun 23 22:24:50 LINUX2 fenced[10901]: fencing node "LINUX1"
Jun 23 22:24:50 LINUX2 fence_manual: Node LINUX1 needs to be reset before recovery can procede.  Waiting for LINUX1 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n LINUX1)
Jun 23 22:25:42 LINUX2 fenced[10901]: fence "LINUX1" success
Jun 23 22:25:47 LINUX2 ccsd[10879]: Attempt to close an unopened CCS descriptor (180).
Jun 23 22:25:47 LINUX2 ccsd[10879]: Error while processing disconnect: Invalid request descriptor

[ 本帖最后由 txl829 于 2008-6-28 14:33 编辑 ]

论坛徽章:
1
数据库技术版块每日发帖之星
日期:2016-03-10 06:20:00
4 [报告]
发表于 2008-06-23 18:49 |只看该作者
1、你的cluster是用conga配置的还是system-config-cluster配置的?

检查所有的集群组件安装成功,检查关键服务是否启动成功。

2、clustat -l显示的结果是1台主机被识别,另一台主机并没有被识别,而且显示都没有被rmager管理,好似心跳解析有问题,从hosts文件可见,建议换个IP段,命名使用全称。

论坛徽章:
0
5 [报告]
发表于 2008-06-24 00:22 |只看该作者
基本上是上面说的问题,第二台主机在启动的时候无法获得第一台主机的心跳,当然第一台主机也是一样。
日志中显示不出除自己之外其他主机加入的信息,在此情况下实际上cluster是无法quorum的。

所以:
首先你的硬件结构是怎样的?物理上必须要保证心跳信号所在的线路也就是10.31和10.33是通的,同时注意检查防火墙是否对心跳信号有影响。
另外既然是RHEL5的集群,你是在用哪个kernel?如果是xen的话,换成普通的kernel。xend会对集群网络配置有影响。

论坛徽章:
0
6 [报告]
发表于 2008-06-24 14:26 |只看该作者

回复 #1 txl829 的帖子

1.我是用system-config-cluster配置的;
2.两台机器的10.31,10.33的地址是通的,相互能ping通,两台机器的防火墙都已禁用;
3.我见clustat -l看到的信息不对,因此没有起rgmanager;
4.操作系统的内核应该不是xen,我用redhat的的光盘装的,没动过内核;
5.后来我用conga重新配了一遍,还是同样的问题;
6.我还遇到过这样的情况,两台机器用clustat -l看,两个节点都是online,但Local会分别的自已的节点上,也就是讲在node1上看,local就在node1,在node2上看,local就在node2。

论坛徽章:
0
7 [报告]
发表于 2008-06-25 11:09 |只看该作者
6.我还遇到过这样的情况,两台机器用clustat -l看,两个节点都是online,但Local会分别的自已的节点上,也就是讲在node1上看,local就在node1,在node2上看,local就在node2。


这是正常的。至于你说的上面其他的情况,按照你讲的应该没有问题。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP