免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 7147 | 回复: 4
打印 上一主题 下一主题

rhcs集群配置错误,提示offline与not a member,求助 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-11-21 09:25 |只看该作者 |倒序浏览
本帖最后由 rml 于 2011-11-21 09:29 编辑

两台ibm 3850 x5 做rhcs集群配置,双网卡,一个网卡对外提供服务,一个网卡连心跳线

主机名1: cmsdb_1
对外提供服务ip地址eth2: 10.233.227.19
心跳线eth3地址:10.233.227.21

主机名2:cmsdb_2
对外提供服务ip地址eth2: 10.233.227.20
心跳线eth3地址:10.233.227.22


问题描述:
1. cman  与 rgmanager  启动提示“正常”,但clustat命令发现只可以看见本地机器“online”,另一台远端机器“offline”
2. 在管理端看到本地机器是“member”,远端机器是“not a member”



cluster.conf信息:<?xml version="1.0" ?>
<cluster config_version="9" name="hnyiysj">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="cmsdb_1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="" name="hnyiysj_1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="cmsdb_2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="" name="hnyiysj_2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="none" ipaddr="10.233.227.21" login="root" name="hnyiysj_1" passwd="redhat"/>
                <fencedevice agent="fence_ipmilan" auth="none" ipaddr="10.233.227.22" login="root" name="hnyiysj_2" passwd="redhat"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="hnyiysj_f" ordered="1" restricted="0">
                                <failoverdomainnode name="cmsdb_2" priority="1"/>
                                <failoverdomainnode name="cmsdb_1" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <fs device="/dev/sdb1" force_fsck="0" force_unmount="1" fsid="47823" fstype="ext3" mountpoint="/oradata" name="oradata" options="" self_fence="0"/>
                        <script file="/etc/rc.d/init.d/oracle10g" name="oracle10g"/>
                </resources>
                <service autostart="1" domain="hnyiysj_f" exclusive="1" name="cmsdb_services">
                        <fs ref="oradata">
                                <script ref="oracle10g"/>
                        </fs>
                </service>
        </rm>
</cluster>


/var/log/message  报错信息:Nov 20 16:35:50 cmsdb_2 kernel: DLM (built Mar 16 2010 21:53:04) installed
Nov 20 16:35:50 cmsdb_2 kernel: GFS2 (built Mar 16 2010 21:53:24) installed
Nov 20 16:35:50 cmsdb_2 kernel: Lock_DLM (built Mar 16 2010 21:53:29) installed
Nov 20 16:35:50 cmsdb_2 ccsd[7995]: Starting ccsd 2.0.115:
Nov 20 16:35:50 cmsdb_2 ccsd[7995]:  Built: Mar 16 2010 10:28:57
Nov 20 16:35:50 cmsdb_2 ccsd[7995]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Nov 20 16:35:50 cmsdb_2 ccsd[7995]: cluster.conf (cluster name = hnyiysj, version = 9) found.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] AIS Executive Service: started and ready to provide service.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] Using default multicast address of 239.192.53.2
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] join (60 ms) send_join (0 ms) consensus (20000 ms) merge (200 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1402
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] send threads (0 threads)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] RRP token expired timeout (495 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] RRP token problem counter (2000 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] RRP threshold (10 problem count)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] RRP mode set to none.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] heartbeat_failures_allowed (0)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] max_network_delay (50 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Transmit multicast socket send buffer size (320000 bytes).
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] The network interface [10.233.227.20] is now up.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Created or loaded sequence id 20.10.233.227.20 for this ring.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] entering GATHER state from 15.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CMAN ] CMAN 2.0.115 (built Mar 16 2010 10:29:01) started
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais event service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais message service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais configuration service'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SYNC ] Not using a virtual synchrony filter.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Creating commit token because I am the rep.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Saving state aru 0 high seq received 0
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Storing new sequence id for ring 18
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] entering COMMIT state.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] entering RECOVERY state.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] position [0] member 10.233.227.20:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] previous ring seq 20 rep 10.233.227.20
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] aru 0 high delivered 0 received flag 1
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Did not need to originate any messages in recovery.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Sending initial ORF token
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] New Configuration:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] Members Left:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] Members Joined:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] New Configuration:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ]         r(0) ip(10.233.227.20)  
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] Members Left:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] Members Joined:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ]         r(0) ip(10.233.227.20)  
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SYNC ] This node is within the primary component and will provide service.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] entering OPERATIONAL state.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CMAN ] quorum regained, resuming activity
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM  ] got nodejoin message 10.233.227.20
Nov 20 16:35:53 cmsdb_2 ccsd[7995]: Initial status:: Quorate
Nov 20 16:36:43 cmsdb_2 fenced[8025]: cmsdb_1 not a cluster member after 3 sec post_join_delay
Nov 20 16:36:43 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:37:03 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:37:03 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:37:04 cmsdb_2 dhclient: DHCPREQUEST on usb0 to 169.254.95.118 port 67
Nov 20 16:37:04 cmsdb_2 dhclient: DHCPACK from 169.254.95.118
Nov 20 16:37:04 cmsdb_2 dhclient: bound to 169.254.95.120 -- renewal in 293 seconds.
Nov 20 16:37:08 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:37:28 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:37:28 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:37:33 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:37:53 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:37:53 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:37:58 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:38:18 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:38:18 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:38:23 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:38:43 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:38:43 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:38:48 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:39:08 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:39:08 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:39:13 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:39:33 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:39:33 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:39:38 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:39:58 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:39:58 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:40:03 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:40:23 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:40:23 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:40:28 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:40:48 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:40:48 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:40:53 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:41:13 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:41:13 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:41:18 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:41:38 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:41:38 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:41:43 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:41:57 cmsdb_2 dhclient: DHCPREQUEST on usb0 to 169.254.95.118 port 67
Nov 20 16:41:57 cmsdb_2 dhclient: DHCPACK from 169.254.95.118
Nov 20 16:41:57 cmsdb_2 dhclient: bound to 169.254.95.120 -- renewal in 293 seconds.
Nov 20 16:42:03 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:42:03 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:42:08 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:42:27 cmsdb_2 kernel: dlm: Using TCP for communications
Nov 20 16:42:28 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:42:28 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:42:33 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:42:53 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:42:53 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:42:58 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:43:18 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:43:18 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:43:23 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:43:43 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:43:43 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:43:48 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:43:54 cmsdb_2 scim-bridge: Panel client has not yet been prepared
Nov 20 16:43:54 cmsdb_2 last message repeated 281 times
Nov 20 16:44:08 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:44:08 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:44:13 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:44:33 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:44:33 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:44:38 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:44:53 cmsdb_2 scim-bridge: The lockfile is destroied
Nov 20 16:44:53 cmsdb_2 scim-bridge: Cleanup, done. Exitting...
Nov 20 16:44:53 cmsdb_2 Cleanup, done. Exitting...
Nov 20 16:44:58 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:44:58 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:45:03 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:45:23 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:45:23 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:45:28 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"

论坛徽章:
0
2 [报告]
发表于 2011-11-22 10:36 |只看该作者
Nov 20 16:36:43 cmsdb_2 fenced[8025]: cmsdb_1 not a cluster member after 3 sec post_join_delay
Nov 20 16:36:43 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:37:03 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:37:03 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed


问题是不是出在这里?  就是无法连接。该检查些什么?

论坛徽章:
0
3 [报告]
发表于 2011-11-22 18:23 |只看该作者
提示: 作者被禁止或删除 内容自动屏蔽

论坛徽章:
0
4 [报告]
发表于 2011-11-29 10:28 |只看该作者
问题终于解决了,是网络和路由的问题

论坛徽章:
0
5 [报告]
发表于 2011-12-29 16:41 |只看该作者
你的问题是如何解决的啊,我也正在做这个,也遇到同样的问题,可以和你交流下么?回复 4# rml


   
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP