论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2014-12-23 17:13 |只看该作者 |倒序浏览

新手想用cman+rgmanager实现HA集群
安装软件有cman 和rgmanager，通过ccs_tool 配置集群

配置文件cluster.conf如下：
<?xml version="1.0"?>
<cluster name="lulucluster" config_version="5">

  <clusternodes>
  <clusternode name="node1" votes="1" nodeid="1"><fence><method name="single"><device name="meatware"/></method></fence></clusternode><clusternode name="node2" votes="1" nodeid="2"><fence><method name="single"><device name="meatware"/></method></fence></clusternode><clusternode name="node3" votes="1" nodeid="3"><fence><method name="single"><device name="meatware"/></method></fence></clusternode></clusternodes>

  <fencedevices>
  <fencedevice name="meatware" agent="fence_manual"/></fencedevices>

  <rm>
<failoverdomains/>
<resources/>
  </rm>
</cluster>

节点列表如下：
Cluster name: lulucluster, config_version: 5

Nodename                      Votes Nodeid Fencetype
node1                            1 1 meatware
node2                            1 2 meatware
node3                            1 3 meatware

hosts文件配置如下：
127.0.0.1    localhost.localdomain localhost
::1          localhost6.localdomain6 localhost6
192.168.225.1 node1
192.168.225.2 node2
192.168.225.3 node3
192.168.225.100 jump

测试hosts解析如下：
[root@node1 ~]# ping node2
PING node2 (192.168.225.2) 56(84) bytes of data.
64 bytes from node2 (192.168.225.2): icmp_seq=1 ttl=64 time=0.885 ms
64 bytes from node2 (192.168.225.2): icmp_seq=2 ttl=64 time=0.369 ms
64 bytes from node2 (192.168.225.2): icmp_seq=3 ttl=64 time=0.311 ms

[root@node1 ~]# ping node3
PING node3 (192.168.225.3) 56(84) bytes of data.
64 bytes from node3 (192.168.225.3): icmp_seq=1 ttl=64 time=22.3 ms
64 bytes from node3 (192.168.225.3): icmp_seq=2 ttl=64 time=0.299 ms
64 bytes from node3 (192.168.225.3): icmp_seq=3 ttl=64 time=0.368 ms

测试互信如下：
[root@node1 ~]# ssh node2
Last login: Mon Dec 22 23:58:54 2014 from node1
[root@node2 ~]#

[root@node1 ~]# ssh node3
Last login: Mon Dec 22 22:29:13 2014 from 192.168.225.200
[root@node3 ~]#

启动cman如下信息：

[root@node1 ~]# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [  OK  ]
Checking Network Manager... [  OK  ]
Global setup... [  OK  ]
Loading kernel modules... [  OK  ]
Mounting configfs... [  OK  ]
Starting cman... [  OK  ]
Waiting for quorum... [  OK  ]
Starting fenced... [  OK  ]
Starting dlm_controld... [  OK  ]
Tuning DLM kernel config... [  OK  ]
Starting gfs_controld... [  OK  ]
Unfencing self... [  OK  ]
Joining fence domain... [  OK  ]

日志如下：

Dec 23 00:46:57 node1 kernel: hrtimer: interrupt took 4900561 ns
Dec 23 00:48:12 node1 kernel: DLM (built Feb 22 2013 00:32:41) installed
Dec 23 00:48:13 node1 corosync[2884]: [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Dec 23 00:48:13 node1 corosync[2884]: [MAIN  ] Corosync built-in features: nss dbus rdma snmp
Dec 23 00:48:13 node1 corosync[2884]: [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
Dec 23 00:48:13 node1 corosync[2884]: [MAIN  ] Successfully parsed cman config
Dec 23 00:48:13 node1 corosync[2884]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 23 00:48:13 node1 corosync[2884]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Dec 23 00:48:13 node1 corosync[2884]: [TOTEM ] The network interface [192.168.225.1] is now up.
Dec 23 00:48:13 node1 corosync[2884]: [QUORUM] Using quorum provider quorum_cman
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Dec 23 00:48:13 node1 corosync[2884]: [CMAN  ] CMAN 3.0.12.1 (built Oct 15 2014 11:44:36) started
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: corosync CMAN membership service 2.90
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: openais checkpoint service B.01.01
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: corosync configuration service
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: corosync profile loading service
Dec 23 00:48:13 node1 corosync[2884]: [QUORUM] Using quorum provider quorum_cman
Dec 23 00:48:13 node1 corosync[2884]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Dec 23 00:48:13 node1 corosync[2884]: [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Dec 23 00:48:13 node1 corosync[2884]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 23 00:48:13 node1 corosync[2884]: [QUORUM] Members[1]: 1
Dec 23 00:48:13 node1 corosync[2884]: [QUORUM] Members[1]: 1
Dec 23 00:48:13 node1 corosync[2884]: [CPG ] chosen downlist: sender r(0) ip(192.168.225.1) ; members(old:0 left:0)
Dec 23 00:48:13 node1 corosync[2884]: [MAIN  ] Completed service synchronization, ready to provide service.
Dec 23 00:48:17 node1 fenced[2937]: fenced 3.0.12.1 started
Dec 23 00:48:17 node1 dlm_controld[2958]: dlm_controld 3.0.12.1 started
Dec 23 00:48:17 node1 gfs_controld[3010]: gfs_controld 3.0.12.1 started

那么问题来了：启动后没有ccsd进程：

root@node1 ~]# netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address             Foreign Address          State    PID/Program name
tcp       0    0 0.0.0.0:111                0.0.0.0:*                LISTEN    1733/rpcbind
tcp       0    0 0.0.0.0:22                0.0.0.0:*                LISTEN    1910/sshd
tcp       0    0 127.0.0.1:631             0.0.0.0:*                LISTEN    1806/cupsd
tcp       0    0 127.0.0.1:25             0.0.0.0:*                LISTEN    2064/master
tcp       0    0 0.0.0.0:55033             0.0.0.0:*                LISTEN    1779/rpc.statd
tcp       0    0 :::111                   :::*                      LISTEN    1733/rpcbind
tcp       0    0 :::22                      :::*                      LISTEN    1910/sshd
tcp       0    0 ::1:631                   :::*                      LISTEN    1806/cupsd
tcp       0    0 :::49603                   :::*                      LISTEN    1779/rpc.statd

集群摘要如下：

root@node1 ~]# cman_tool status
Version: 6.2.0
Config Version: 5
Cluster Name: lulucluster
Cluster Id: 57140
Cluster Member: Yes
Cluster Generation: 40
Membership state: Cluster-Member
Nodes: 1
Expected votes: 3
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 7
Flags:
Ports Bound: 0
Node name: node1
Node ID: 1
Multicast addresses: 239.192.223.20
Node addresses: 192.168.225.1

[root@node1 ~]# clustat
Cluster Status for lulucluster @ Tue Dec 23 00:49:57 2014
Member Status: Inquorate

Member Name                                                    ID Status
------ ----                                                    ---- ------
node1                                                             1 Online, Local
node2                                                             2 Offline
node3                                                             3 Offline

其他节点启动cman信息如下：

[root@node2 cluster]# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [  OK  ]
Checking Network Manager... [  OK  ]
Global setup... [  OK  ]
Loading kernel modules... [  OK  ]
Mounting configfs... [  OK  ]
Starting cman... xmlconfig cannot find /etc/cluster/cluster.conf
[FAILED]
Stopping cluster:
Leaving fence domain... [  OK  ]
Stopping gfs_controld... [  OK  ]
Stopping dlm_controld... [  OK  ]
Stopping fenced... [  OK  ]
Stopping cman... [  OK  ]
Unloading kernel modules... [  OK  ]
Unmounting configfs... [  OK  ]

综上所述，node1上cman启动有问题，ccsd无法启动，也就无法把集群配置同步到各个节点，导致其他节点也不能启动。请问各位高手这是怎么导致的，哪里出了问题，多谢各位指点

localhost, 配置文件, 软件

文库|博客

lu162

白手起家

论坛徽章:: 0

2楼 [报告]

发表于 2014-12-23 19:10 |只看该作者

问题已解决，cman3.0版本不使用ccsd作为配置同步工具，内核中使用corosync完成信息层的工作，因此需要手动把配置文件下发到各个节点后启动启动cman服务一切正常。多谢各位支持！

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

返回列表

Chinaunix › 论坛 › IT运维 › 集群和高可用 › cman+rgmanager实现HA集群启动cman后没有ccsd进程

cman+rgmanager实现HA集群启动cman后没有ccsd进程 [复制链接]