- 论坛徽章:
- 0
|
新手想用cman+rgmanager实现HA集群
安装软件有cman 和rgmanager,通过ccs_tool 配置集群
配置文件cluster.conf如下:
<?xml version="1.0"?>
<cluster name="lulucluster" config_version="5">
<clusternodes>
<clusternode name="node1" votes="1" nodeid="1"><fence><method name="single"><device name="meatware"/></method></fence></clusternode><clusternode name="node2" votes="1" nodeid="2"><fence><method name="single"><device name="meatware"/></method></fence></clusternode><clusternode name="node3" votes="1" nodeid="3"><fence><method name="single"><device name="meatware"/></method></fence></clusternode></clusternodes>
<fencedevices>
<fencedevice name="meatware" agent="fence_manual"/></fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
节点列表如下:
Cluster name: lulucluster, config_version: 5
Nodename Votes Nodeid Fencetype
node1 1 1 meatware
node2 1 2 meatware
node3 1 3 meatware
hosts文件配置如下:
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.225.1 node1
192.168.225.2 node2
192.168.225.3 node3
192.168.225.100 jump
测试hosts解析如下:
[root@node1 ~]# ping node2
PING node2 (192.168.225.2) 56(84) bytes of data.
64 bytes from node2 (192.168.225.2): icmp_seq=1 ttl=64 time=0.885 ms
64 bytes from node2 (192.168.225.2): icmp_seq=2 ttl=64 time=0.369 ms
64 bytes from node2 (192.168.225.2): icmp_seq=3 ttl=64 time=0.311 ms
[root@node1 ~]# ping node3
PING node3 (192.168.225.3) 56(84) bytes of data.
64 bytes from node3 (192.168.225.3): icmp_seq=1 ttl=64 time=22.3 ms
64 bytes from node3 (192.168.225.3): icmp_seq=2 ttl=64 time=0.299 ms
64 bytes from node3 (192.168.225.3): icmp_seq=3 ttl=64 time=0.368 ms
测试互信如下:
[root@node1 ~]# ssh node2
Last login: Mon Dec 22 23:58:54 2014 from node1
[root@node2 ~]#
[root@node1 ~]# ssh node3
Last login: Mon Dec 22 22:29:13 2014 from 192.168.225.200
[root@node3 ~]#
启动cman如下信息:
[root@node1 ~]# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Starting gfs_controld... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]
日志如下:
Dec 23 00:46:57 node1 kernel: hrtimer: interrupt took 4900561 ns
Dec 23 00:48:12 node1 kernel: DLM (built Feb 22 2013 00:32:41) installed
Dec 23 00:48:13 node1 corosync[2884]: [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Dec 23 00:48:13 node1 corosync[2884]: [MAIN ] Corosync built-in features: nss dbus rdma snmp
Dec 23 00:48:13 node1 corosync[2884]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Dec 23 00:48:13 node1 corosync[2884]: [MAIN ] Successfully parsed cman config
Dec 23 00:48:13 node1 corosync[2884]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 23 00:48:13 node1 corosync[2884]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Dec 23 00:48:13 node1 corosync[2884]: [TOTEM ] The network interface [192.168.225.1] is now up.
Dec 23 00:48:13 node1 corosync[2884]: [QUORUM] Using quorum provider quorum_cman
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Dec 23 00:48:13 node1 corosync[2884]: [CMAN ] CMAN 3.0.12.1 (built Oct 15 2014 11:44:36) started
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: corosync configuration service
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: corosync profile loading service
Dec 23 00:48:13 node1 corosync[2884]: [QUORUM] Using quorum provider quorum_cman
Dec 23 00:48:13 node1 corosync[2884]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Dec 23 00:48:13 node1 corosync[2884]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Dec 23 00:48:13 node1 corosync[2884]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 23 00:48:13 node1 corosync[2884]: [QUORUM] Members[1]: 1
Dec 23 00:48:13 node1 corosync[2884]: [QUORUM] Members[1]: 1
Dec 23 00:48:13 node1 corosync[2884]: [CPG ] chosen downlist: sender r(0) ip(192.168.225.1) ; members(old:0 left:0)
Dec 23 00:48:13 node1 corosync[2884]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 23 00:48:17 node1 fenced[2937]: fenced 3.0.12.1 started
Dec 23 00:48:17 node1 dlm_controld[2958]: dlm_controld 3.0.12.1 started
Dec 23 00:48:17 node1 gfs_controld[3010]: gfs_controld 3.0.12.1 started
那么问题来了:启动后没有ccsd进程:
root@node1 ~]# netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1733/rpcbind
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1910/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 1806/cupsd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2064/master
tcp 0 0 0.0.0.0:55033 0.0.0.0:* LISTEN 1779/rpc.statd
tcp 0 0 :::111 :::* LISTEN 1733/rpcbind
tcp 0 0 :::22 :::* LISTEN 1910/sshd
tcp 0 0 ::1:631 :::* LISTEN 1806/cupsd
tcp 0 0 :::49603 :::* LISTEN 1779/rpc.statd
集群摘要如下:
root@node1 ~]# cman_tool status
Version: 6.2.0
Config Version: 5
Cluster Name: lulucluster
Cluster Id: 57140
Cluster Member: Yes
Cluster Generation: 40
Membership state: Cluster-Member
Nodes: 1
Expected votes: 3
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 7
Flags:
Ports Bound: 0
Node name: node1
Node ID: 1
Multicast addresses: 239.192.223.20
Node addresses: 192.168.225.1
[root@node1 ~]# clustat
Cluster Status for lulucluster @ Tue Dec 23 00:49:57 2014
Member Status: Inquorate
Member Name ID Status
------ ---- ---- ------
node1 1 Online, Local
node2 2 Offline
node3 3 Offline
其他节点启动cman信息如下:
[root@node2 cluster]# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... xmlconfig cannot find /etc/cluster/cluster.conf
[FAILED]
Stopping cluster:
Leaving fence domain... [ OK ]
Stopping gfs_controld... [ OK ]
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... [ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
综上所述,node1上cman启动有问题,ccsd无法启动,也就无法把集群配置同步到各个节点,导致其他节点也不能启动。请问各位高手这是怎么导致的,哪里出了问题,多谢各位指点 |
|