免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3185 | 回复: 4
打印 上一主题 下一主题

项目被RHCS卡住了。。。。。。求jerrywjl大哥指导下。 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2009-11-30 09:14 |只看该作者 |倒序浏览
项目被RHCS卡住了。。。。。。求jerrywjl大哥指导下。集群节点能online但是clustat下面没有资源显示。资源部能启动。
具体情况如下:
[root@udbapp1 ~]# uname -a
Linux udbapp1 2.6.18-53.el5xen #1 SMP Wed Oct 10 16:48:44 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
[root@udbapp1 ~]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain   localhost
::1     localhost6.localdomain6 localhost6
119.87.244.70   udbapp1.local udbapp1
119.87.244.69   udbapp2.local udbapp2
#119.87.244.70    udbapp1
#119.87.244.69    udbapp2
[root@udbapp1 ~]#

-------------------------------------------
[root@udbapp2 ~]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain   localhost
::1     localhost6.localdomain6 localhost6
119.87.244.70   udbapp1.local udbapp1
119.87.244.69   udbapp2.local udbapp2
#119.87.244.70    udbapp1
#119.87.244.69    udbapp2

上面是/etc/hosts文件。fence用HP ilo。硬件设备连接如下:
HP主机双网口进行bond产生bond0 IP分别为119.87.244.70和69 ilo分别为71和72.网口和ILO口都连接到交换机都能相互ping通。
[root@udbapp2 ~]# fence_ilo -a 119.87.244.71 -l redhat -p redhat123456 -o status
power is ON
success
-------------------
[root@udbapp1 ~]# fence_ilo -a 119.87.244.72 -l redhat -p redhat123456 -o status
power is ON
success

/etc/cluster/cluster.conf文件如下:
<?xml version="1.0" ?>
<cluster config_version="1" name="cluster_2">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="udbapp1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence_1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="udbapp2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence_2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
<cman expected_votes="1" two_node="1">
<multicast addr="224.0.0.1"/>
</cman>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="119.87.244.71" login="redhat" name="fence_1" passwd="redhat123456"/>
                <fencedevice agent="fence_ilo" hostname="119.87.244.72" login="redhat" name="fence_2" passwd="redhat123456"/>
        </fencedevices>
<rm>
                <failoverdomains>
                        <failoverdomain name="udbapp" ordered="1" restricted="0"/>
                        <failoverdomainnode name="udbapp1" priority="1"/>
                </failoverdomains>
                <resources>
                <ip address="119.87.244.73" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="udbapp" name="apache" recovery="relocate">
                                <ip ref="119.87.244.73"/>
                                <script file="/etc/init.d/httpd" name="httpd"/>
                </service>
        </rm>
</cluster>

两边同时service cman start的情况:
[root@udbapp1 ~]# service cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
[确定]

clustat显示的情况:
[root@udbapp1 ~]# clustat
Member Status: Quorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  udbapp1                               1 Online, Local
  udbapp2                               2 Online

显示资源没有启动。
[root@udbapp1 ~]# clusvcadm  -e apache -m udbapp1
Member udbapp1 trying to enable service:apache...Success
service:apache is now running on udbapp1
[root@udbapp1 ~]# ps -ef|grep httpd
root     32690 30413  0 09:08 pts/1    00:00:00 grep httpd
仍没有启动资源。。。。

[root@udbapp1 ~]# service rgmanager start
启动 Cluster Service Manager:[确定]
[root@udbapp1 ~]# service rgmanager status
clurgmgrd 已死,但 pid 文件仍存
启动rgmanager 却发现进程已死。。。

下面是启动过程的messages:
[root@udbapp2 ~]# tail -f /var/log/messages
Nov 30 09:11:36 udbapp2 ccsd[20433]: Starting ccsd 2.0.60:
Nov 30 09:11:36 udbapp2 ccsd[20433]:  Built: Jan 23 2007 12:42:13
Nov 30 09:11:36 udbapp2 ccsd[20433]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Nov 30 09:11:36 udbapp2 ccsd[20433]: cluster.conf (cluster name = cluster_2, version = 1) found.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] AIS Executive Service RELEASE 'subrev 1324 version 0.80.2'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] AIS Executive Service: started and ready to provide service.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_cpg loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais cluster closed process group service v1.01'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_cfg loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais configuration service'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_msg loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais message service B.01.01'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_lck loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais distributed locking service B.01.01'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_evt loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais event service B.01.01'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_ckpt loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais checkpoint service B.01.01'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_amf loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais availability management framework B.01.01'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_clm loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais cluster membership service B.01.01'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_evs loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais extended virtual synchrony service'
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] openais component openais_cman loaded.
Nov 30 09:11:39 udbapp2 openais[20439]: [MAIN ] Registering service handler 'openais CMAN membership service 2.01'
Nov 30 09:11:39 udbapp2 openais[20439]: [TOTEM] Token Timeout (10000 ms) retransmit timeout (495 ms)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] token hold (386 ms) retransmits before loss (20 retrans)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] send threads (0 threads)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] RRP token expired timeout (495 ms)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] RRP token problem counter (2000 ms)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] RRP threshold (10 problem count)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] RRP mode set to none.
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] heartbeat_failures_allowed (0)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] max_network_delay (50 ms)
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] The network interface [119.87.244.69] is now up.
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] Created or loaded sequence id 0.119.87.244.69 for this ring.
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] entering GATHER state from 15.
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais extended virtual synchrony service'
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais cluster membership service B.01.01'
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais availability management framework B.01.01'
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais checkpoint service B.01.01'
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais event service B.01.01'
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais message service B.01.01'
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais configuration service'
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais cluster closed process group service v1.01'
Nov 30 09:11:40 udbapp2 ccsd[20433]: Initial status:: Quorate
Nov 30 09:11:40 udbapp2 openais[20439]: [SERV ] Initialising service handler 'openais CMAN membership service 2.01'
Nov 30 09:11:40 udbapp2 openais[20439]: [CMAN ] CMAN 2.0.60 (built Jan 23 2007 12:42:16) started
Nov 30 09:11:40 udbapp2 openais[20439]: [SYNC ] Not using a virtual synchrony filter.
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] Creating commit token because I am the rep.
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] Saving state aru 0 high seq received 0
Nov 30 09:11:40 udbapp2 openais[20439]: [TOTEM] entering COMMIT state.
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] entering RECOVERY state.
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] position [0] member 119.87.244.69:
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] previous ring seq 0 rep 119.87.244.69
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] aru 0 high delivered 0 received flag 0
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] Did not need to originate any messages in recovery.
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] Storing new sequence id for ring 4
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] Sending initial ORF token
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] New Configuration:
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] Members Left:
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] Members Joined:
Nov 30 09:11:41 udbapp2 openais[20439]: [SYNC ] This node is within the primary component and will provide service.
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] New Configuration:
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ]         r(0) ip(119.87.244.69)  
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] Members Left:
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] Members Joined:
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ]         r(0) ip(119.87.244.69)  
Nov 30 09:11:41 udbapp2 openais[20439]: [SYNC ] This node is within the primary component and will provide service.
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] entering OPERATIONAL state.
Nov 30 09:11:41 udbapp2 openais[20439]: [CMAN ] quorum regained, resuming activity
Nov 30 09:11:41 udbapp2 openais[20439]: [CLM  ] got nodejoin message 119.87.244.69
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] entering GATHER state from 11.
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] Creating commit token because I am the rep.
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] Saving state aru 9 high seq received 9
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] entering COMMIT state.
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] entering RECOVERY state.
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] position [0] member 119.87.244.69:
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] previous ring seq 4 rep 119.87.244.69
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] aru 9 high delivered 9 received flag 0
Nov 30 09:11:41 udbapp2 openais[20439]: [TOTEM] position [1] member 119.87.244.70:
Nov 30 09:11:42 udbapp2 openais[20439]: [TOTEM] previous ring seq 4 rep 119.87.244.70
Nov 30 09:11:42 udbapp2 openais[20439]: [TOTEM] aru 9 high delivered 9 received flag 0
Nov 30 09:11:42 udbapp2 openais[20439]: [TOTEM] Did not need to originate any messages in recovery.
Nov 30 09:11:42 udbapp2 openais[20439]: [TOTEM] Storing new sequence id for ring 8
Nov 30 09:11:42 udbapp2 openais[20439]: [TOTEM] Sending initial ORF token
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] New Configuration:
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ]         r(0) ip(119.87.244.69)  
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] Members Left:
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] Members Joined:
Nov 30 09:11:42 udbapp2 openais[20439]: [SYNC ] This node is within the primary component and will provide service.
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] CLM CONFIGURATION CHANGE
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] New Configuration:
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ]         r(0) ip(119.87.244.69)  
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ]         r(0) ip(119.87.244.70)  
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] Members Left:
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] Members Joined:
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ]         r(0) ip(119.87.244.70)  
Nov 30 09:11:42 udbapp2 openais[20439]: [SYNC ] This node is within the primary component and will provide service.
Nov 30 09:11:42 udbapp2 openais[20439]: [TOTEM] entering OPERATIONAL state.
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] got nodejoin message 119.87.244.69
Nov 30 09:11:42 udbapp2 openais[20439]: [CLM  ] got nodejoin message 119.87.244.70


麻烦各位兄弟姐妹分析下messages是否有报错。别人一下子就建立起来了。我这个就是不行。
1.我看文档没有明确说要专有心跳网络。心跳能直接走数据接口。但是有哪个配置文件在设置心跳走哪个网络?

论坛徽章:
0
2 [报告]
发表于 2009-11-30 09:34 |只看该作者
第一,配置文件基本没有问题,但为什么failoverdomain中只有一个节点?
<failoverdomains>
                        <failoverdomain name="udbapp" ordered="1" restricted="0"/>
                        <failoverdomainnode name="udbapp1" priority="1"/>
                </failoverdomains>
第二,在clusvcadm之前要先启动rgmanager,所以你应该将rgmanager设置为enable,重启两台机器看看。或者手动启动rgmanager,由集群自己去决定在哪台机器上启apache。
第三,能不用xen的内核尽量不要用。

论坛徽章:
0
3 [报告]
发表于 2009-11-30 13:36 |只看该作者
提示: 作者被禁止或删除 内容自动屏蔽

论坛徽章:
9
CU大牛徽章
日期:2013-03-13 15:29:07CU大牛徽章
日期:2013-03-13 15:29:49CU大牛徽章
日期:2013-03-13 15:30:19CU大牛徽章
日期:2013-03-14 14:16:46CU大牛徽章
日期:2013-03-14 14:16:49CU大牛徽章
日期:2013-03-14 14:16:51CU大牛徽章
日期:2013-03-14 14:16:52处女座
日期:2014-06-11 10:34:40技术图书徽章
日期:2014-07-11 16:32:15
4 [报告]
发表于 2009-12-15 13:15 |只看该作者
LZ如果在上海的话可以联系我,我这边可以介绍个RHCA给你解决问题。

论坛徽章:
0
5 [报告]
发表于 2009-12-15 14:31 |只看该作者
配资源啊
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP