- 论坛徽章:
- 0
|
本帖最后由 rml 于 2011-11-21 09:29 编辑
两台ibm 3850 x5 做rhcs集群配置,双网卡,一个网卡对外提供服务,一个网卡连心跳线
主机名1: cmsdb_1
对外提供服务ip地址eth2: 10.233.227.19
心跳线eth3地址:10.233.227.21
主机名2:cmsdb_2
对外提供服务ip地址eth2: 10.233.227.20
心跳线eth3地址:10.233.227.22
问题描述:
1. cman 与 rgmanager 启动提示“正常”,但clustat命令发现只可以看见本地机器“online”,另一台远端机器“offline”
2. 在管理端看到本地机器是“member”,远端机器是“not a member”
cluster.conf信息:<?xml version="1.0" ?>
<cluster config_version="9" name="hnyiysj">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="cmsdb_1" nodeid="1" votes="1">
<fence>
<method name="1">
<device lanplus="" name="hnyiysj_1"/>
</method>
</fence>
</clusternode>
<clusternode name="cmsdb_2" nodeid="2" votes="1">
<fence>
<method name="1">
<device lanplus="" name="hnyiysj_2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="none" ipaddr="10.233.227.21" login="root" name="hnyiysj_1" passwd="redhat"/>
<fencedevice agent="fence_ipmilan" auth="none" ipaddr="10.233.227.22" login="root" name="hnyiysj_2" passwd="redhat"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="hnyiysj_f" ordered="1" restricted="0">
<failoverdomainnode name="cmsdb_2" priority="1"/>
<failoverdomainnode name="cmsdb_1" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<fs device="/dev/sdb1" force_fsck="0" force_unmount="1" fsid="47823" fstype="ext3" mountpoint="/oradata" name="oradata" options="" self_fence="0"/>
<script file="/etc/rc.d/init.d/oracle10g" name="oracle10g"/>
</resources>
<service autostart="1" domain="hnyiysj_f" exclusive="1" name="cmsdb_services">
<fs ref="oradata">
<script ref="oracle10g"/>
</fs>
</service>
</rm>
</cluster>
/var/log/message 报错信息:Nov 20 16:35:50 cmsdb_2 kernel: DLM (built Mar 16 2010 21:53:04) installed
Nov 20 16:35:50 cmsdb_2 kernel: GFS2 (built Mar 16 2010 21:53:24) installed
Nov 20 16:35:50 cmsdb_2 kernel: Lock_DLM (built Mar 16 2010 21:53:29) installed
Nov 20 16:35:50 cmsdb_2 ccsd[7995]: Starting ccsd 2.0.115:
Nov 20 16:35:50 cmsdb_2 ccsd[7995]: Built: Mar 16 2010 10:28:57
Nov 20 16:35:50 cmsdb_2 ccsd[7995]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Nov 20 16:35:50 cmsdb_2 ccsd[7995]: cluster.conf (cluster name = hnyiysj, version = 9) found.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] AIS Executive Service: started and ready to provide service.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] Using default multicast address of 239.192.53.2
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] join (60 ms) send_join (0 ms) consensus (20000 ms) merge (200 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1402
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] send threads (0 threads)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] RRP token expired timeout (495 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] RRP token problem counter (2000 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] RRP threshold (10 problem count)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] RRP mode set to none.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] heartbeat_failures_allowed (0)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] max_network_delay (50 ms)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Receive multicast socket recv buffer size (320000 bytes).
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Transmit multicast socket send buffer size (320000 bytes).
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] The network interface [10.233.227.20] is now up.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Created or loaded sequence id 20.10.233.227.20 for this ring.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] entering GATHER state from 15.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CMAN ] CMAN 2.0.115 (built Mar 16 2010 10:29:01) started
Nov 20 16:35:53 cmsdb_2 openais[8004]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais event service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais message service B.01.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais configuration service'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SYNC ] Not using a virtual synchrony filter.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Creating commit token because I am the rep.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Saving state aru 0 high seq received 0
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Storing new sequence id for ring 18
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] entering COMMIT state.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] entering RECOVERY state.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] position [0] member 10.233.227.20:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] previous ring seq 20 rep 10.233.227.20
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] aru 0 high delivered 0 received flag 1
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Did not need to originate any messages in recovery.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] Sending initial ORF token
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] CLM CONFIGURATION CHANGE
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] New Configuration:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] Members Left:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] Members Joined:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] CLM CONFIGURATION CHANGE
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] New Configuration:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] r(0) ip(10.233.227.20)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] Members Left:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] Members Joined:
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] r(0) ip(10.233.227.20)
Nov 20 16:35:53 cmsdb_2 openais[8004]: [SYNC ] This node is within the primary component and will provide service.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [TOTEM] entering OPERATIONAL state.
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CMAN ] quorum regained, resuming activity
Nov 20 16:35:53 cmsdb_2 openais[8004]: [CLM ] got nodejoin message 10.233.227.20
Nov 20 16:35:53 cmsdb_2 ccsd[7995]: Initial status:: Quorate
Nov 20 16:36:43 cmsdb_2 fenced[8025]: cmsdb_1 not a cluster member after 3 sec post_join_delay
Nov 20 16:36:43 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:37:03 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:37:03 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:37:04 cmsdb_2 dhclient: DHCPREQUEST on usb0 to 169.254.95.118 port 67
Nov 20 16:37:04 cmsdb_2 dhclient: DHCPACK from 169.254.95.118
Nov 20 16:37:04 cmsdb_2 dhclient: bound to 169.254.95.120 -- renewal in 293 seconds.
Nov 20 16:37:08 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:37:28 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:37:28 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:37:33 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:37:53 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:37:53 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:37:58 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:38:18 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:38:18 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:38:23 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:38:43 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:38:43 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:38:48 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:39:08 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:39:08 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:39:13 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:39:33 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:39:33 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:39:38 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:39:58 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:39:58 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:40:03 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:40:23 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:40:23 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:40:28 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:40:48 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:40:48 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:40:53 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:41:13 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:41:13 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:41:18 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:41:38 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:41:38 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:41:43 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:41:57 cmsdb_2 dhclient: DHCPREQUEST on usb0 to 169.254.95.118 port 67
Nov 20 16:41:57 cmsdb_2 dhclient: DHCPACK from 169.254.95.118
Nov 20 16:41:57 cmsdb_2 dhclient: bound to 169.254.95.120 -- renewal in 293 seconds.
Nov 20 16:42:03 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:42:03 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:42:08 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:42:27 cmsdb_2 kernel: dlm: Using TCP for communications
Nov 20 16:42:28 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:42:28 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:42:33 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:42:53 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:42:53 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:42:58 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:43:18 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:43:18 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:43:23 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:43:43 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:43:43 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:43:48 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:43:54 cmsdb_2 scim-bridge: Panel client has not yet been prepared
Nov 20 16:43:54 cmsdb_2 last message repeated 281 times
Nov 20 16:44:08 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:44:08 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:44:13 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:44:33 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:44:33 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:44:38 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:44:53 cmsdb_2 scim-bridge: The lockfile is destroied
Nov 20 16:44:53 cmsdb_2 scim-bridge: Cleanup, done. Exitting...
Nov 20 16:44:53 cmsdb_2 Cleanup, done. Exitting...
Nov 20 16:44:58 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:44:58 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:45:03 cmsdb_2 fenced[8025]: fencing node "cmsdb_1"
Nov 20 16:45:23 cmsdb_2 fenced[8025]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:10.233.227.21...ipmilan: Failed to connect after 20 seconds Failed
Nov 20 16:45:23 cmsdb_2 fenced[8025]: fence "cmsdb_1" failed
Nov 20 16:45:28 cmsdb_2 fenced[8025]: fencing node "cmsdb_1" |
|