免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 15498 | 回复: 10
打印 上一主题 下一主题

cman 无法启动 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-10-24 16:56 |显示全部楼层 |倒序浏览
cman 无法启动,而且还没有具体原因。请教各位是为什么。

相关信息如下:

[root@cms2 ~]# service cman restart
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
[  OK  ]
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... failed

[FAILED]


message的日志如下:
Oct 24 16:24:57 cms2 openais[18831]: [TOTEM] heartbeat_failures_allowed (0)
Oct 24 16:24:57 cms2 openais[18831]: [TOTEM] max_network_delay (50 ms)
Oct 24 16:24:57 cms2 openais[18831]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Oct 24 16:24:57 cms2 openais[18831]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Oct 24 16:24:57 cms2 openais[18831]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Oct 24 16:24:57 cms2 openais[18831]: [TOTEM] The network interface [192.168.201.2] is now up.
Oct 24 16:24:57 cms2 openais[18831]: [TOTEM] Created or loaded sequence id 0.192.168.201.2 for this ring.
Oct 24 16:24:57 cms2 openais[18831]: [TOTEM] entering GATHER state from 15.
Oct 24 16:24:57 cms2 openais[18831]: [SERV ] Initialising service handler 'openais extended virtual synchrony service'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais availability management framework B.01.01'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais event service B.01.01'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais message service B.01.01'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais configuration service'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01'
Oct 24 16:24:58 cms2 openais[18831]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01'
Oct 24 16:24:58 cms2 openais[18831]: [CMAN ] CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
Oct 24 16:24:58 cms2 openais[18831]: [SYNC ] Not using a virtual synchrony filter.
Oct 24 16:24:58 cms2 openais[18831]: [IPC  ] ERROR: Could not bind AF_UNIX: Address already in use.
Oct 24 16:24:58 cms2 openais[18831]: [MAIN ] AIS Executive exiting (-7).
Oct 24 16:24:59 cms2 ccsd[18750]: Unable to connect to cluster infrastructure after 30 seconds.

论坛徽章:
0
2 [报告]
发表于 2008-10-25 09:00 |显示全部楼层
原帖由 hmqq 于 2008-10-25 01:50 发表
Oct 24 16:24:58 cms2 openais[18831]:  ERROR: Could not bind AF_UNIX: Address already in use.


请问这是指那个address呢?

论坛徽章:
0
3 [报告]
发表于 2008-10-25 20:12 |显示全部楼层

回复 #4 gl00ad 的帖子

多谢!
我的是RH5

[root@cms2 ~]# uname -an
Linux cms2 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:21 EST 2007 i686 i686 i386 GNU/Linux

我的iptables中已经打开了udp 5404/5405的。但是依旧报错,我把iptables禁用也是一样的情况。还有其他要注意的吗?

以下为配置:

[root@cms2 ~]# more /etc/sysconfig/iptables
# Generated by iptables-save v1.3.5 on Sat Oct 25 18:48:52 2008
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [3208937289:8709118251955]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp -m icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -p esp -j ACCEPT
-A RH-Firewall-1-INPUT -p ah -j ACCEPT
-A RH-Firewall-1-INPUT -d 224.0.0.251 -p udp -m udp --dport 5353 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m udp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 21 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 5404 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 5405 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 16851 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 21064 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 41966 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 41967 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 41968 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 41969 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 50006 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 50008 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 50009 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 50007 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport 161 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 8070 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 873 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited
COMMIT
# Completed on Sat Oct 25 18:48:52 2008

论坛徽章:
0
4 [报告]
发表于 2008-10-26 09:53 |显示全部楼层
我之前曾经测试过,只要设备重启(iptables开启且配置没有改变的情况下)cman可以运行。但是现在已经是生产环境了,不能重启了。

现在双机之间的心跳线是直连方式的。

从单播IP看,应该不存在重复。感觉应该是RHCS内部通讯用的组播地址可能会有问题。cms1运行都正常,但是本机cms2不能work。

[root@cms2 ~]# clustat
CMAN is not running.
[root@cms2 ~]# netstat -an|grep udp
udp        0      0 0.0.0.0:32769               0.0.0.0:*                              
udp        0      0 0.0.0.0:514                 0.0.0.0:*                              
udp        0      0 127.0.0.1:5405              0.0.0.0:*                              
udp        0      0 127.0.0.1:5149              0.0.0.0:*                              
udp        0      0 226.94.1.1:5405             0.0.0.0:*                              
udp        0      0 0.0.0.0:161                 0.0.0.0:*                              
udp        0      0 0.0.0.0:825                 0.0.0.0:*                              
udp        0      0 0.0.0.0:828                 0.0.0.0:*                              
udp        0      0 0.0.0.0:5353                0.0.0.0:*                              
udp        0      0 0.0.0.0:111                 0.0.0.0:*                              
udp        0      0 0.0.0.0:631                 0.0.0.0:*                              
udp        0      0 :::32770                    :::*                                    
udp        0      0 :::32771                    :::*                                    
udp        0      0 :::32772                    :::*                                    
udp        0      0 :::2463                     :::*                                    
udp        0      0 :::50007                    :::*                                    
udp        0      0 :::5353                     :::*                             

[root@cms2 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="cms" config_version="4" name="cms">
        <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="cms1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="1" name="cms1-fence"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="cms2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="1" name="cms2-fence"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="password" ipaddr="192.1
68.201.11" login="a" name="cms1-fence" passwd="a"/>
                <fencedevice agent="fence_ipmilan" auth="password" ipaddr="192.1
68.201.12" login="a" name="cms2-fence" passwd="a"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>



[root@cms2 ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost.localdomain   localhost       cms2.Guangdong
::1     localhost6.localdomain6 localhost       cms2.Guangdong6
192.168.201.1 cms1
192.168.201.2 cms2




[root@cms2 ~]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:1E:4F:39:91:94  
          inet addr:IPA  Bcast:IPB  Mask:255.255.255.240
          inet6 addr: fe80::21e:4fff:fe39:9194/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:4177457574 errors:0 dropped:0 overruns:0 frame:0
          TX packets:608846683 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2005133999 (1.8 GiB)  TX bytes:1653861240 (1.5 GiB)

eth0      Link encap:Ethernet  HWaddr 00:1E:4F:39:91:92  
          inet addr:192.168.201.2  Bcast:192.168.201.255  Mask:255.255.255.0
          inet6 addr: fe80::21e:4fff:fe39:9192/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:34221328 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24361128 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1013490086 (966.5 MiB)  TX bytes:3409997433 (3.1 GiB)
          Interrupt:169 Memory:da000000-da012100

eth1      Link encap:Ethernet  HWaddr 00:1E:4F:39:91:94  
          inet6 addr: fe80::21e:4fff:fe39:9194/64 Scope:Link
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:4177451384 errors:0 dropped:0 overruns:0 frame:0
          TX packets:608846666 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2003644680 (1.8 GiB)  TX bytes:1653856872 (1.5 GiB)
          Interrupt:169 Memory:d6000000-d6012100

eth2      Link encap:Ethernet  HWaddr 00:1E:4F:39:91:94  
          inet6 addr: fe80::21e:4fff:fe39:9194/64 Scope:Link
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:6190 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1489319 (1.4 MiB)  TX bytes:4368 (4.2 KiB)
          Base address:0xdce0 Memory:d5ee0000-d5f00000

[ 本帖最后由 oioilu 于 2008-10-26 09:56 编辑 ]

论坛徽章:
0
5 [报告]
发表于 2008-10-26 18:55 |显示全部楼层

回复 #9 gl00ad 的帖子

Here is the output



  1. [root@cms2 ~]# lsof -i UDP:5405
  2. COMMAND  PID USER   FD   TYPE   DEVICE SIZE NODE NAME
  3. aisexec 9884 root    3u  IPv4 37348117       UDP 226.94.1.1:netsupport
  4. aisexec 9884 root    5u  IPv4 37348119       UDP localhost.localdomain:netsupport

复制代码


奇怪,难道aisexec不会随着service cman restart而自动重启吗?


  1. [root@cms2 ~]# ps -ef | grep aisexec
  2. root      9884     1  5 Jul31 ?        4-13:17:07 aisexec
  3. root     13708 13649  0 18:44 pts/1    00:00:00 grep aisexec
复制代码


我把9884 进程kill了,之后fence又出错。


  1. [root@cms2 ~]# service cman restart
  2. Stopping cluster:
  3.    Stopping fencing... done
  4.    Stopping cman... done
  5.    Stopping ccsd... done
  6.    Unmounting configfs... done
  7. [  OK  ]
  8. Starting cluster:
  9.    Loading modules... done
  10.    Mounting configfs... done
  11.    Starting ccsd... done
  12.    Starting cman... done
  13.    Starting daemons... done
  14.    Starting fencing... failed

  15. [FAILED]
复制代码



相关的syslog变了
以下是node2上的log

  1. Oct 26 18:45:05 cms2 ccsd[13898]: Starting ccsd 2.0.60:
  2. Oct 26 18:45:05 cms2 ccsd[13898]:  Built: Jan 23 2007 12:42:25
  3. Oct 26 18:45:05 cms2 ccsd[13898]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
  4. Oct 26 18:45:05 cms2 ccsd[13898]: cluster.conf (cluster name = cms, version = 4) found.
  5. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] AIS Executive Service RELEASE 'subrev 1324 version 0.80.2'
  6. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
  7. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
  8. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] AIS Executive Service: started and ready to provide service.
  9. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Using default multicast address of 239.192.2.219
  10. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_cpg loaded.
  11. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais cluster closed process group service v1.01'
  12. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_cfg loaded.
  13. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais configuration service'
  14. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_msg loaded.
  15. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais message service B.01.01'
  16. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_lck loaded.
  17. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais distributed locking service B.01.01'
  18. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_evt loaded.
  19. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais event service B.01.01'
  20. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_ckpt loaded.
  21. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais checkpoint service B.01.01'
  22. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_amf loaded.
  23. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais availability management framework B.01.01'
  24. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_clm loaded.
  25. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais cluster membership service B.01.01'
  26. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_evs loaded.
  27. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais extended virtual synchrony service'
  28. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] openais component openais_cman loaded.
  29. Oct 26 18:45:06 cms2 openais[13904]: [MAIN ] Registering service handler 'openais CMAN membership service 2.01'
  30. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
  31. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
  32. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
  33. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
  34. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
  35. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
  36. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] send threads (0 threads)
  37. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] RRP token expired timeout (495 ms)
  38. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] RRP token problem counter (2000 ms)
  39. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] RRP threshold (10 problem count)
  40. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] RRP mode set to none.
  41. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] heartbeat_failures_allowed (0)
  42. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] max_network_delay (50 ms)
  43. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
  44. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
  45. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
  46. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] The network interface [192.168.201.2] is now up.
  47. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Created or loaded sequence id 0.192.168.201.2 for this ring.
  48. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] entering GATHER state from 15.
  49. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais extended virtual synchrony service'
  50. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01'
  51. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais availability management framework B.01.01'
  52. Oct 26 18:45:07 cms2 ccsd[13898]: Initial status:: Quorate
  53. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01'
  54. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais event service B.01.01'
  55. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
  56. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais message service B.01.01'
  57. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais configuration service'
  58. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01'
  59. Oct 26 18:45:07 cms2 openais[13904]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01'
  60. Oct 26 18:45:07 cms2 openais[13904]: [CMAN ] CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
  61. Oct 26 18:45:07 cms2 openais[13904]: [SYNC ] Not using a virtual synchrony filter.
  62. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Creating commit token because I am the rep.
  63. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Saving state aru 0 high seq received 0
  64. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] entering COMMIT state.
  65. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] entering RECOVERY state.
  66. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] position [0] member 192.168.201.2:
  67. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] previous ring seq 0 rep 192.168.201.2
  68. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] aru 0 high delivered 0 received flag 0
  69. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Did not need to originate any messages in recovery.
  70. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Storing new sequence id for ring 4
  71. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Sending initial ORF token
  72. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] CLM CONFIGURATION CHANGE
  73. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] New Configuration:
  74. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] Members Left:
  75. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] Members Joined:
  76. Oct 26 18:45:07 cms2 openais[13904]: [SYNC ] This node is within the primary component and will provide service.
  77. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] CLM CONFIGURATION CHANGE
  78. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] New Configuration:
  79. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ]    r(0) ip(192.168.201.2)  
  80. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] Members Left:
  81. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] Members Joined:
  82. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ]    r(0) ip(192.168.201.2)  
  83. Oct 26 18:45:07 cms2 openais[13904]: [SYNC ] This node is within the primary component and will provide service.
  84. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] entering OPERATIONAL state.
  85. Oct 26 18:45:07 cms2 openais[13904]: [CMAN ] quorum regained, resuming activity
  86. Oct 26 18:45:07 cms2 openais[13904]: [CLM  ] got nodejoin message 192.168.201.2
  87. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] entering GATHER state from 11.
  88. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] Saving state aru 9 high seq received 9
  89. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] entering COMMIT state.
  90. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] entering RECOVERY state.
  91. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] position [0] member 192.168.201.1:
  92. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] previous ring seq 116 rep 192.168.201.1
  93. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] aru c high delivered c received flag 0
  94. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] position [1] member 192.168.201.2:
  95. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] previous ring seq 4 rep 192.168.201.2
  96. Oct 26 18:45:07 cms2 openais[13904]: [TOTEM] aru 9 high delivered 9 received flag 0
  97. Oct 26 18:45:08 cms2 openais[13904]: [TOTEM] Did not need to originate any messages in recovery.
  98. Oct 26 18:45:08 cms2 openais[13904]: [TOTEM] Storing new sequence id for ring 78
  99. Oct 26 18:45:08 cms2 groupd[13912]: found uncontrolled kernel object rgmanager in /sys/kernel/dlm
  100. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] CLM CONFIGURATION CHANGE
  101. Oct 26 18:45:08 cms2 groupd[13912]: found uncontrolled kernel object clvmd in /sys/kernel/dlm
  102. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] New Configuration:
  103. Oct 26 18:45:08 cms2 groupd[13912]: local node must be reset to clear 2 uncontrolled instances of gfs and/or dlm
  104. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ]    r(0) ip(192.168.201.2)  
  105. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] Members Left:
  106. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] Members Joined:
  107. Oct 26 18:45:08 cms2 openais[13904]: [SYNC ] This node is within the primary component and will provide service.
  108. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] CLM CONFIGURATION CHANGE
  109. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] New Configuration:
  110. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ]    r(0) ip(192.168.201.1)  
  111. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ]    r(0) ip(192.168.201.2)  
  112. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] Members Left:
  113. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] Members Joined:
  114. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ]    r(0) ip(192.168.201.1)  
  115. Oct 26 18:45:08 cms2 openais[13904]: [SYNC ] This node is within the primary component and will provide service.
  116. Oct 26 18:45:08 cms2 openais[13904]: [TOTEM] entering OPERATIONAL state.
  117. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] got nodejoin message 192.168.201.1
  118. Oct 26 18:45:08 cms2 openais[13904]: [CLM  ] got nodejoin message 192.168.201.2
  119. Oct 26 18:45:08 cms2 openais[13904]: [CPG  ] got joinlist message from node 1
  120. [b][color=Red]Oct 26 18:45:08 cms2 openais[13904]: [CMAN ] cman killed by node 2 for reason 2
  121. Oct 26 18:45:08 cms2 dlm_controld[13924]: cluster is down, exiting[/color][/b]
  122. Oct 26 18:45:08 cms2 gfs_controld[13930]: cluster is down, exiting
  123. Oct 26 18:45:08 cms2 fenced[13918]: cluster is down, exiting
  124. Oct 26 18:45:08 cms2 kernel: dlm: closing connection to node 2
  125. Oct 26 18:45:08 cms2 kernel: dlm: closing connection to node 1
  126. Oct 26 18:45:35 cms2 ccsd[13898]: Unable to connect to cluster infrastructure after 30 seconds.

复制代码


以下是node1上的log


  1. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] entering GATHER state from 11.
  2. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] Creating commit token because I am the rep.
  3. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] Saving state aru c high seq received c
  4. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] entering COMMIT state.
  5. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] entering RECOVERY state.
  6. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] position [0] member 192.168.201.1:
  7. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] previous ring seq 124 rep 192.168.201.1
  8. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] aru c high delivered c received flag 0
  9. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] position [1] member 192.168.201.2:
  10. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] previous ring seq 4 rep 192.168.201.2
  11. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] aru 9 high delivered 9 received flag 0
  12. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] Did not need to originate any messages in recovery.
  13. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] Storing new sequence id for ring 80
  14. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] Sending initial ORF token
  15. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] CLM CONFIGURATION CHANGE
  16. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] New Configuration:
  17. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ]     r(0) ip(192.168.201.1)  
  18. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] Members Left:
  19. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] Members Joined:
  20. Oct 26 18:56:03 cms1 openais[3369]: [SYNC ] This node is within the primary component and will provide service.
  21. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] CLM CONFIGURATION CHANGE
  22. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] New Configuration:
  23. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ]     r(0) ip(192.168.201.1)  
  24. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ]     r(0) ip(192.168.201.2)  
  25. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] Members Left:
  26. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] Members Joined:
  27. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ]     r(0) ip(192.168.201.2)  
  28. Oct 26 18:56:03 cms1 openais[3369]: [SYNC ] This node is within the primary component and will provide service.
  29. Oct 26 18:56:03 cms1 openais[3369]: [TOTEM] entering OPERATIONAL state.
  30. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] got nodejoin message 192.168.201.1
  31. Oct 26 18:56:03 cms1 openais[3369]: [CLM  ] got nodejoin message 192.168.201.2
  32. Oct 26 18:56:03 cms1 openais[3369]: [CPG  ] got joinlist message from node 1
  33. Oct 26 18:56:14 cms1 openais[3369]: [TOTEM] The token was lost in the OPERATIONAL state.
  34. Oct 26 18:56:14 cms1 openais[3369]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
  35. Oct 26 18:56:14 cms1 openais[3369]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
  36. Oct 26 18:56:14 cms1 openais[3369]: [TOTEM] entering GATHER state from 2.
  37. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] entering GATHER state from 0.
  38. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] Creating commit token because I am the rep.
  39. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] Saving state aru 17 high seq received 17
  40. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] entering COMMIT state.
  41. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] entering RECOVERY state.
  42. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] position [0] member 192.168.201.1:
  43. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] previous ring seq 128 rep 192.168.201.1
  44. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] aru 17 high delivered 17 received flag 0
  45. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] Did not need to originate any messages in recovery.
  46. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] Storing new sequence id for ring 84
  47. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] Sending initial ORF token
  48. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] CLM CONFIGURATION CHANGE
  49. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] New Configuration:
  50. Oct 26 18:56:19 cms1 kernel: dlm: closing connection to node 2
  51. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ]     r(0) ip(192.168.201.1)  
  52. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] Members Left:
  53. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ]     r(0) ip(192.168.201.2)  
  54. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] Members Joined:
  55. Oct 26 18:56:19 cms1 openais[3369]: [SYNC ] This node is within the primary component and will provide service.
  56. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] CLM CONFIGURATION CHANGE
  57. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] New Configuration:
  58. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ]     r(0) ip(192.168.201.1)  
  59. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] Members Left:
  60. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] Members Joined:
  61. Oct 26 18:56:19 cms1 openais[3369]: [SYNC ] This node is within the primary component and will provide service.
  62. Oct 26 18:56:19 cms1 openais[3369]: [TOTEM] entering OPERATIONAL state.
  63. Oct 26 18:56:19 cms1 openais[3369]: [CLM  ] got nodejoin message 192.168.201.1
  64. Oct 26 18:56:19 cms1 openais[3369]: [CPG  ] got joinlist message from node 1
复制代码

[ 本帖最后由 oioilu 于 2008-10-26 19:09 编辑 ]

论坛徽章:
0
6 [报告]
发表于 2008-10-27 10:24 |显示全部楼层
我更改了hosts文件后,cman重启还是fence设备报错。log显示依然是"[CMAN ] cman killed by node 2 for reason 2 "

使用lsof -i UDP:5405,已经没有显示,表明端口已经空闲。

以下是ifcfg的配置。


  1. [root@cms2 network-scripts]# more ifcfg-bond0
  2. DEVICE=bond0
  3. BOOTPROTO=none
  4. IPADDR=61.8.165.36
  5. NETMASK=255.255.255.240
  6. ONBOOT=yes
  7. TYPE=Ethernet
  8. GATEWAY=61.8.165.33
  9. USERCTL=no
  10. [root@cms2 network-scripts]# more ifcfg-eth0
  11. BOOTPROTO=none
  12. DEVICE=eth0
  13. ONBOOT=yes
  14. IPADDR=192.168.201.2
  15. NETMASK=255.255.255.0
  16. HWADDR=00:1e:4f:39:91:92
  17. TYPE=Ethernet
  18. USERCTL=no
  19. IPV6INIT=no
  20. PEERDNS=yes
  21. [root@cms2 network-scripts]# more ifcfg-eth1
  22. BOOTPROTO=none
  23. DEVICE=eth1
  24. ONBOOT=yes
  25. MASTER=bond0
  26. SLAVE=yes
  27. TYPE=Ethernet
  28. [root@cms2 network-scripts]# more ifcfg-eth2
  29. BOOTPROTO=none
  30. DEVICE=eth2
  31. ONBOOT=yes
  32. MASTER=bond0
  33. SLAVE=yes
  34. TYPE=Ethernet
复制代码

[ 本帖最后由 oioilu 于 2008-10-27 11:15 编辑 ]

论坛徽章:
0
7 [报告]
发表于 2008-10-27 17:58 |显示全部楼层
I get an official answer from a Redhat Guy:

It means you have a very old version of cman that needs updating
That message as from 5.0 and lots of things have been fixed (including
that error) since then .

Chrissie

Reason "2" is that someone issued a cman_tool kill command on another
node. So it's nothing wrong with the cluster that has caused that message.

Chrissie

[ 本帖最后由 oioilu 于 2008-10-28 17:51 编辑 ]

论坛徽章:
0
8 [报告]
发表于 2008-10-27 23:45 |显示全部楼层
原帖由 jerrywjl 于 2008-10-27 22:17 发表



谁说的?!我怎么不知道RHEL5上有这样的问题?

首先,如果你的系统和所使用的集群软件来自同一个光盘,那么版本匹配方面不会有问题,即便RHEL5的GA版本有这样那样的毛病,但是还不至于有这样严重的问题 ...


我的现在已经是生产环境了,不可能为了这个而重起的,毕竟不是研究性质。

另外redhat的人员联系方式为:Christine Caulfield <ccaulfie@redhat.com>  (Red Hat)

[ 本帖最后由 oioilu 于 2008-10-27 23:47 编辑 ]

论坛徽章:
0
9 [报告]
发表于 2008-10-28 17:50 |显示全部楼层
Update from Redhat ppl:

Reason "2" is that someone issued a cman_tool kill command on another
node. So it's nothing wrong with the cluster that has caused that message.

论坛徽章:
0
10 [报告]
发表于 2008-10-28 22:38 |显示全部楼层
原帖由 jerrywjl 于 2008-10-28 20:43 发表



OK,这是个牛人,如果她都搞不定,我相信地球上也没有几个人搞定。看来我没有必要为这个问题钻牛角尖了。



Yes, She//

BTW,你知道如何更新cluster软件包不?我只想更新cluster而不想更新其他组件。是不是需要下src然后make?
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP