免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: jerryjzm
打印 上一主题 下一主题

rhcs中fencing device 的DRAC问题 [复制链接]

论坛徽章:
1
天秤座
日期:2013-10-23 13:20:42
31 [报告]
发表于 2009-07-24 10:39 |只看该作者
感谢Aramis
ipmi,在  ctrl+e设置后,安装了 ipmitool工具后,可以成功使用ipmitool来reset 机器那!

现在fence的配置结束那!

在配置rhcs中有个主机名解析的问题

以下为2台机器/etc/hosts文件中的配置
-----------------------------------------------------------------------------------------------------------------------------------
127.0.0.1       ecp-app1        localhost.localdomain   localhost
::1     localhost6.localdomain6 localhost6

#192.168.10.1    ecp_app_cluster_node1
#192.168.10.2    ecp_app_cluster_node2
#134.*.*.77   ecp_app_cluster_service

134.*.*.76    ecp_app_cluster_node1
134.*.*.80    ecp_app_cluster_node2
134.*.*.77    ecp_app_cluster_servic
e

其中134.*.*.*  是public network,192.168.*.* 是private network为内部通讯

134.*.*.76是配置在 eth0 上
192.168.*.* 是bond0,绑定了eth2,eth3

此时启动cman服务
[root@ecp-app1 ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
[  OK  ]
一切正常。

---------------------------------------------------------------------------------------------------------------------------
如果 /etc/hosts文件是这样的话,private
192.168.10.1    ecp_app_cluster_node1
192.168.10.2    ecp_app_cluster_node2
134.*.*.77   ecp_app_cluster_service


#134.*.*.76    ecp_app_cluster_node1
#134.*.*.80    ecp_app_cluster_node2
#134.*.*.77    ecp_app_cluster_service


这时启动 cman 服务
[root@ecp-app1 ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... failed
cman not started: Can't find local node name in cluster.conf /usr/sbin/cman_tool: aisexec daemon didn't start
[FAILED]

ps:
[root@ecp-app1 ~]# ping ecp_app_cluster_node1
PING ecp_app_cluster_node1 (192.168.10.1) 56(84) bytes of data.
64 bytes from ecp_app_cluster_node1 (192.168.10.1): icmp_seq=1 ttl=64 time=0.073 ms
64 bytes from ecp_app_cluster_node1 (192.168.10.1): icmp_seq=2 ttl=64 time=0.015 ms
64 bytes from ecp_app_cluster_node1 (192.168.10.1): icmp_seq=3 ttl=64 time=0.017 ms

--- ecp_app_cluster_node1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.015/0.035/0.073/0.026 ms
[root@ecp-app1 ~]# ping ecp_app_cluster_node2
PING ecp_app_cluster_node2 (192.168.10.2) 56(84) bytes of data.
64 bytes from ecp_app_cluster_node2 (192.168.10.2): icmp_seq=1 ttl=64 time=0.174 ms
64 bytes from ecp_app_cluster_node2 (192.168.10.2): icmp_seq=2 ttl=64 time=0.149 ms
64 bytes from ecp_app_cluster_node2 (192.168.10.2): icmp_seq=3 ttl=64 time=0.146 ms

怎么用public的地址能解析成功,而用内部通讯的就无法解析了。  是不是我要配置啥。谢那!

论坛徽章:
0
32 [报告]
发表于 2009-07-24 10:58 |只看该作者
不管是内还是外,你要记住一个原则:

心跳网络 对应 /etc/hosts中的对应关系 而同时要对应/etc/cluster/clustere.conf中的clusternode name!

论坛徽章:
1
天秤座
日期:2013-10-23 13:20:42
33 [报告]
发表于 2009-07-24 11:55 |只看该作者
原帖由 jerrywjl 于 2009-7-24 10:58 发表
不管是内还是外,你要记住一个原则:

心跳网络 对应 /etc/hosts中的对应关系 而同时要对应/etc/cluster/clustere.conf中的clusternode name!


心跳网络 对应 /etc/hosts中的对应关系 而同时要对应/etc/cluster/clustere.conf中的clusternode name

现在就是遵循这个原则有问题了,我就是如果对应了心跳网络 192.168.10.* 就无法解析主机名,启动cman报错

如果是不使用心跳网络而是用公用的134.175.*.* 这就没问题,启动cman也ok!

论坛徽章:
0
34 [报告]
发表于 2009-07-24 14:34 |只看该作者

回复 #33 jerryjzm 的帖子

参考一下cman的文档,好像在<clusternodes>下面可以配置,文档上写的没有测试过,共参考:
<altname name="mynode2" mcast="239.1.1.2" port="6810"/>

把name修改为心跳网络的名字试试。

论坛徽章:
0
35 [报告]
发表于 2009-07-24 15:52 |只看该作者

回复 #7 jerryjzm 的帖子

现在有个疑问,以为一直为看到 134.*.*.78:443 drac的配置页面,此处    Starting fencing... done 能否说明fencing device这一块在rhcs已经ok那?

不能。fence服务启动成功与否和能不能fence成功并没有关系。

论坛徽章:
1
天秤座
日期:2013-10-23 13:20:42
36 [报告]
发表于 2009-07-24 16:40 |只看该作者
谢谢各位回答。
我在使用system-config-cluster 和 conga 配置的过程中
只要是使用哪个bond0的网口地址,
都会在 cman 服务处报错

conga报了
Unable to retrieve batch 1699530578 status from ecp-app-cluster-node2.ecp:11111: ccs_tool failed to propagate conf: Unable to connect to the CCS daemon: Connection refused Failed to update config file. Updating cluster configuration -- You will be redirected in 5 seconds.

system-config-cluster 报了
[root@ecp-app1 cluster]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... failed
cman not started: Can't find local node name in cluster.conf /usr/sbin/cman_tool: aisexec daemon didn't start
[FAILED]

这过程既是给bond0的地址换成和eth0为同网段的,都不行

论坛徽章:
1
天秤座
日期:2013-10-23 13:20:42
37 [报告]
发表于 2009-07-24 18:17 |只看该作者
是不是 /etc/cluster/cluster.conf 中节点名要和 hostname一致了?

论坛徽章:
0
38 [报告]
发表于 2009-07-25 11:27 |只看该作者
如果你用的是RHEL5的系统和RHEL4U5以上的系统做双机,那就在机器上运行一个sosreport命令,将收集的bz2包贴出来,我有时间可以帮你看看。

论坛徽章:
1
天秤座
日期:2013-10-23 13:20:42
39 [报告]
发表于 2009-07-27 11:58 |只看该作者
提出我的node1,node2上 /etc/hosts
node1
[root@ecp-app1 ~]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1  localhost.localdomain        localhost
::1     localhost6.localdomain6 localhost6

#134.175.13.76   ecp-app-cluster-node1.ecp
#134.175.13.80   ecp-app-cluster-node2.ecp
#134.175.13.77   ecp_app_cluster_service

134.175.13.76    ecp-app1
134.175.13.80    ecp-app2
192.168.10.1     ecp-app1.cluster.com         ecp-app1
192.168.10.2     ecp-app2.cluster.com         ecp-app2


#134.175.13.76     ecp-app1.cluster         ecp-app1
#134.175.13.80     ecp-app2.cluster         ecp-app2
#134.175.13.77    ecp_app_cluster_service


node2
[root@ecp-app2 etc]# more hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1   localhost.localdomain       localhost
::1     localhost6.localdomain6 localhost6

#134.175.13.76   ecp-app-cluster-node1.ecp
#134.175.13.80   ecp-app-cluster-node2.ecp
#134.175.13.77   ecp_app_cluster_service

134.175.13.76    ecp-app1
134.175.13.80    ecp-app2
192.168.10.1     ecp-app1.cluster.com         ecp-app1
192.168.10.2     ecp-app2.cluster.com         ecp-app2


#134.175.13.76     ecp-app1.cluster         ecp-app1
#134.175.13.80     ecp-app2.cluster         ecp-app2
#134.175.13.77    ecp_app_cluster_service


[root@ecp-app2 etc]# more /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="ecp-app-ha" config_version="12" name="ecp-app-ha">
        <fence_daemon clean_start="0" post_fail_delay="15" post_join_delay="25"/>
        <clusternodes>
                <clusternode name="ecp-app2.cluster.com" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="ecp_app_fence2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="ecp-app1.cluster.com" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="ecp_app_fence1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" ipaddr="134.175.13.78" login="root" name="ecp_app_fence1" passwd="science"/>
                <fencedevice agent="fence_ipmilan" ipaddr="134.175.13.82" login="root" name="ecp_app_fence2" passwd="science"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="ecp_failover_domain" nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="ecp-app2.cluster.com" priority="1"/>
                                <failoverdomainnode name="ecp-app1.cluster.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="134.175.13.77" monitor_link="1"/>
                        <clusterfs device="/dev/mapper/appvg-lv_ecp_98g" force_unmount="1" fsid="49425" fstype="gfs" mountpoint="/ec
p" name="ecp_app_storage" self_fence="1"/>
                </resources>
                <service autostart="1" domain="ecp_failover_domain" exclusive="0" name="ecp_app_service_ip" recovery="relocate">
                        <ip ref="134.175.13.77"/>
                </service>
                <service autostart="1" domain="ecp_failover_domain" exclusive="0" name="ecp_app_service_storage" recovery="relocate"
>
                        <clusterfs ref="ecp_app_storage"/>
                </service>
        </rm>
</cluster>


在这样的配置,我将gfs和ip分别配置到不同之资源中,这样ip地址就可以启动,但是那个gfs的文件系统始终无法正常挂载。


more /var/log/messages
Jul 27 10:49:51 ecp-app2 openais[12618]: [CMAN ] CMAN 2.0.84 (built Apr 15 2008 16:19:14) started
Jul 27 10:49:51 ecp-app2 openais[12618]: [SYNC ] Not using a virtual synchrony filter.
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] Creating commit token because I am the rep.
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] Saving state aru 0 high seq received 0
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] Storing new sequence id for ring 54
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] entering COMMIT state.
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] entering RECOVERY state.
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] position [0] member 192.168.10.2:
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] previous ring seq 80 rep 192.168.10.2
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] aru 0 high delivered 0 received flag 1
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] Did not need to originate any messages in recovery.
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] Sending initial ORF token
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] New Configuration:
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] Members Left:
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] Members Joined:
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] New Configuration:
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ]        r(0) ip(192.168.10.2)  
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] Members Left:
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] Members Joined:
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ]        r(0) ip(192.168.10.2)  
Jul 27 10:49:51 ecp-app2 openais[12618]: [SYNC ] This node is within the primary component and will provide service.
Jul 27 10:49:51 ecp-app2 openais[12618]: [TOTEM] entering OPERATIONAL state.
Jul 27 10:49:51 ecp-app2 openais[12618]: [CMAN ] quorum regained, resuming activity
Jul 27 10:49:51 ecp-app2 openais[12618]: [CLM  ] got nodejoin message 192.168.10.2
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] entering GATHER state from 11.
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] Saving state aru 9 high seq received 9
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] Storing new sequence id for ring 5c
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] entering COMMIT state.
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] entering RECOVERY state.
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] position [0] member 192.168.10.1:
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] previous ring seq 88 rep 192.168.10.1
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] aru 9 high delivered 9 received flag 1
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] position [1] member 192.168.10.2:
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] previous ring seq 84 rep 192.168.10.2
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] aru 9 high delivered 9 received flag 1
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] Did not need to originate any messages in recovery.
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] New Configuration:
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ]        r(0) ip(192.168.10.2)  
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] Members Left:
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] Members Joined:
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] CLM CONFIGURATION CHANGE
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] New Configuration:
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ]        r(0) ip(192.168.10.1)  
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ]        r(0) ip(192.168.10.2)  
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] Members Left:
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] Members Joined:
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ]        r(0) ip(192.168.10.1)  
Jul 27 10:49:52 ecp-app2 openais[12618]: [SYNC ] This node is within the primary component and will provide service.
Jul 27 10:49:52 ecp-app2 openais[12618]: [TOTEM] entering OPERATIONAL state.
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] got nodejoin message 192.168.10.1
Jul 27 10:49:52 ecp-app2 openais[12618]: [CLM  ] got nodejoin message 192.168.10.2
Jul 27 10:49:52 ecp-app2 ccsd[12612]: Initial status:: Quorate
Jul 27 10:49:56 ecp-app2 kernel: dlm: Using TCP for communications
Jul 27 10:49:56 ecp-app2 kernel: dlm: connecting to 2
Jul 27 10:49:57 ecp-app2 clvmd: Cluster LVM daemon started - connected to CMAN
Jul 27 10:49:59 ecp-app2 clurgmgrd[12755]: <notice> Resource Group Manager Starting
Jul 27 10:50:58 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 1 -> 2).
Jul 27 10:51:02 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 2 -> 3).
Jul 27 10:51:09 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:05:42 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 3 -> 4).
Jul 27 11:05:49 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:06:47 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 4 -> 5).
Jul 27 11:06:59 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:08:15 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 5 -> 6).
Jul 27 11:08:30 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:09:21 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 6 -> 7).
Jul 27 11:09:30 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:10:35 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 7 -> .
Jul 27 11:10:40 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:12:29 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 8 -> 9).
Jul 27 11:12:40 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:12:41 ecp-app2 clurgmgrd[12755]: <notice> Initializing service:ecp_app_service
Jul 27 11:12:49 ecp-app2 clurgmgrd[12755]: <notice> Recovering failed service service:ecp_app_service
Jul 27 11:12:49 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:12:49 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:12:52 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:12:52 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:12:55 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:12:55 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:12:58 ecp-app2 clurgmgrd[12755]: <notice> start on clusterfs "ecp_app_storage" returned 2 (invalid argument(s))
Jul 27 11:12:58 ecp-app2 clurgmgrd[12755]: <warning> #68: Failed to start service:ecp_app_service; return value: 1
Jul 27 11:12:58 ecp-app2 clurgmgrd[12755]: <notice> Stopping service service:ecp_app_service
Jul 27 11:13:00 ecp-app2 clurgmgrd[12755]: <notice> Service service:ecp_app_service is recovering
Jul 27 11:14:38 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 9 -> 10).
Jul 27 11:14:46 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:15:33 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 10 -> 11).
Jul 27 11:15:33 ecp-app2 ccsd[12612]: Unable to parse updated config file.
Jul 27 11:15:46 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:15:46 ecp-app2 clurgmgrd[12755]: <notice> Initializing service:ecp_app_service_ip
Jul 27 11:17:12 ecp-app2 ccsd[12612]: Update of cluster.conf complete (version 11 -> 12).
Jul 27 11:17:20 ecp-app2 clurgmgrd[12755]: <notice> Reconfiguring
Jul 27 11:17:20 ecp-app2 clurgmgrd[12755]: <notice> Initializing service:ecp_app_service_storage
Jul 27 11:17:34 ecp-app2 clurgmgrd[12755]: <notice> Recovering failed service service:ecp_app_service_storage
Jul 27 11:17:34 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:17:34 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:17:37 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:17:37 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:17:40 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:17:40 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:17:43 ecp-app2 clurgmgrd[12755]: <notice> start on clusterfs "ecp_app_storage" returned 2 (invalid argument(s))
Jul 27 11:17:43 ecp-app2 clurgmgrd[12755]: <warning> #68: Failed to start service:ecp_app_service_storage; return value: 1
Jul 27 11:17:43 ecp-app2 clurgmgrd[12755]: <notice> Stopping service service:ecp_app_service_storage
Jul 27 11:17:46 ecp-app2 clurgmgrd[12755]: <notice> Service service:ecp_app_service_storage is recovering
Jul 27 11:22:13 ecp-app2 luci[7278]: Unable to retrieve batch 273858179 status from ecp-app2.cluster.com:11111: module scheduled for execution
Jul 27 11:22:52 ecp-app2 clurgmgrd[12755]: <notice> Starting disabled service service:ecp_app_service_ip
Jul 27 11:22:54 ecp-app2 in.rdiscd[19896]: setsockopt (IP_ADD_MEMBERSHIP): Address already in use
Jul 27 11:22:54 ecp-app2 in.rdiscd[19896]: Failed joining addresses
Jul 27 11:22:54 ecp-app2 clurgmgrd[12755]: <notice> Service service:ecp_app_service_ip started
Jul 27 11:22:55 ecp-app2 clurgmgrd[12755]: <notice> Stopping service service:ecp_app_service_ip
Jul 27 11:22:57 ecp-app2 luci[7278]: Unable to retrieve batch 1293856098 status from ecp-app2.cluster.com:11111: module scheduled for execution
Jul 27 11:23:05 ecp-app2 clurgmgrd[12755]: <notice> Service service:ecp_app_service_ip is stopped
Jul 27 11:23:06 ecp-app2 luci[7278]: Unable to retrieve batch 1293856098 status from ecp-app2.cluster.com:11111: module scheduled for execution
Jul 27 11:23:06 ecp-app2 clurgmgrd[12755]: <notice> Service service:ecp_app_service_ip is now running on member 2
Jul 27 11:24:50 ecp-app2 clurgmgrd[12755]: <notice> Starting stopped service service:ecp_app_service_ip
Jul 27 11:24:51 ecp-app2 clurgmgrd[12755]: <notice> Service service:ecp_app_service_ip started
Jul 27 11:25:53 ecp-app2 clurgmgrd[12755]: <notice> Starting stopped service service:ecp_app_service_storage
Jul 27 11:25:53 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:25:53 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:25:55 ecp-app2 luci[7278]: Unable to retrieve batch 1917832722 status from ecp-app2.cluster.com:11111: module scheduled for execution
Jul 27 11:25:56 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:25:56 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:25:59 ecp-app2 gfs_controld[12654]: mount: fs requires cluster="ecp_app_cluster" current="ecp-app-ha"
Jul 27 11:25:59 ecp-app2 clurgmgrd: [12755]: <err> 'mount -t gfs  /dev/mapper/appvg-lv_ecp_98g /ecp' failed, error=1
Jul 27 11:26:02 ecp-app2 clurgmgrd[12755]: <notice> start on clusterfs "ecp_app_storage" returned 2 (invalid argument(s))
Jul 27 11:26:02 ecp-app2 clurgmgrd[12755]: <warning> #68: Failed to start service:ecp_app_service_storage; return value: 1
Jul 27 11:26:02 ecp-app2 clurgmgrd[12755]: <notice> Stopping service service:ecp_app_service_storage
Jul 27 11:26:04 ecp-app2 luci[7278]: Unable to retrieve batch 1917832722 status from ecp-app2.cluster.com:11111: module scheduled for execution
Jul 27 11:26:04 ecp-app2 clurgmgrd[12755]: <notice> Service service:ecp_app_service_storage is recovering
Jul 27 11:26:04 ecp-app2 clurgmgrd[12755]: <warning> #71: Relocating failed service service:ecp_app_service_storage
Jul 27 11:26:13 ecp-app2 luci[7278]: Unable to retrieve batch 1917832722 status from ecp-app2.cluster.com:11111: module scheduled for execution
Jul 27 11:26:16 ecp-app2 clurgmgrd[12755]: <notice> Service service:ecp_app_service_storage is stopped
Jul 27 11:26:22 ecp-app2 luci[7278]: Unable to retrieve batch 1917832722 status from ecp-app2.cluster.com:11111: clusvcadm start failed to start ecp_app_service_storage:

论坛徽章:
1
天秤座
日期:2013-10-23 13:20:42
40 [报告]
发表于 2009-07-27 12:01 |只看该作者
原帖由 jerrywjl 于 2009-7-25 11:27 发表
如果你用的是RHEL5的系统和RHEL4U5以上的系统做双机,那就在机器上运行一个sosreport命令,将收集的bz2包贴出来,我有时间可以帮你看看。



我贴出那一堆问题,好排查不?

我如果在 /etc/hosts  使用 134.175.*.* 来解析
ecp-app1.cluster.com
ecp-app2.cluster.com
的话,就没问题。但这样无法起到对内网心跳的效果了
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP