免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
1234下一页
最近访问板块 发新帖
查看: 11337 | 回复: 32
打印 上一主题 下一主题

redhat AS 4用RHCS做HA,断掉第一台机器网线,服务不能切换! [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2006-12-26 17:36 |只看该作者 |正序浏览
两台机器系统Redhat AS 4 U4
集群软件 RHCS
两台机器相关的配置如下:
[root@vm002 ~]# more /etc/hosts   两台机器一样的内容
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost
192.168.0.201   vm001
192.168.0.202   vm002

两台机器正常启动之后
[root@vm002 ~]#clustat -i 3
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  vm001                                    Online, rgmanager
  vm002                                    Online, Local, rgmanager

  Service Name         Owner (Last)                   State         
  ------- ----         ----- ------                   -----         
  ftpservice           vm001                          started   

但是我断掉第一根网线之后,等了1分钟之后,出现
[root@vm002 ~]#clustat -i 3
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  vm001                                    Offline
  vm002                                    Online, Local, rgmanager

  Service Name         Owner (Last)                   State         
  ------- ----         ----- ------                   -----         
  ftpservice           unknown                        started   


我的集群配置文件是:
[root@vm002 ~]# more /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster alias="zcbcluster" config_version="33" name="alpha_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="vm001" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="clusterfence" nodename="vm001"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="vm002" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="clusterfence" nodename="vm002"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="clusterfence"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="ftp-domain" ordered="1" restricted="1">
                                <failoverdomainnode name="vm001" priority="1"/>
                                <failoverdomainnode name="vm002" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="192.168.0.203" monitor_link="1"/>
                        <script file="/etc/rc.d/init.d/vsftpdHA.sh" name="ftpHA"/>
                        <fs device="/dev/sdb1" force_fsck="0" force_unmount="1" fsid="61663" fstype="ext3" mountpoint="/ftp" name="f
tpcontent" options="rw" self_fence="0"/>
                </resources>
                <service autostart="1" domain="ftp-domain" name="ftpservice" recovery="relocate">
                        <ip ref="192.168.0.203">
                                <fs ref="ftpcontent"/>
                                <script ref="ftpHA"/>
                        </ip>
                </service>
        </rm>
</cluster>

请问有什么办法,解决网线断了,在备机起服务?(首先我这两台机器服务可以相互切换)

论坛徽章:
1
荣誉版主
日期:2011-11-23 16:44:17
33 [报告]
发表于 2006-12-31 10:08 |只看该作者
大家都消消气,NNTP版主的意思也不是不能问问题。
原帖由 fuumax 于 2006-12-30 16:59 发表


这种参数其实和技术无关,至少和linux方面的技术无关,

只有3种人能清楚的解答你的问题

配过同型号设备的人
这种fence设备品牌的厂商或代理
红帽的开发人员

即使把电话打到红帽800,也很难得到一个 ...


而是像这位兄弟说的一样,要自己去查硬件的说明书,尝试自己去解决问题。论坛不是某个公司的收费的技术支持,就有回答问题和不回答问题的自由。而向别人提供收费服务的,就一定要有能力解决别人的问题。话回到论坛上来,如果想得到别人的帮助,就要虚心请教,同时也要乐于助人。帮别人就是帮自己。。。。

论坛徽章:
0
32 [报告]
发表于 2006-12-30 18:17 |只看该作者
关键我们是没有购买RHCS软件,都是网上下载的做的集群,现在就遇到一些困难!
希望大家能提供援助!

论坛徽章:
0
31 [报告]
发表于 2006-12-30 17:15 |只看该作者
RH800 (China) can handle this.  call them if you have purchased subscription of RHCS.

论坛徽章:
0
30 [报告]
发表于 2006-12-30 16:59 |只看该作者
原帖由 SUNfan 于 2006-12-30 14:42 发表
在线等待fence方面的回答!


这种参数其实和技术无关,至少和linux方面的技术无关,

只有3种人能清楚的解答你的问题

配过同型号设备的人
这种fence设备品牌的厂商或代理
红帽的开发人员

即使把电话打到红帽800,也很难得到一个满意的答复,原因很简单红帽的800工程师也不可能配过所有的fence设备

你可以去红帽邮件列表里查这个设备的关键字,或者选择可靠的硬件厂商支持,google一般很难搜索到这种答案

当然如果是我作这项目,我觉得看硬件说明书然后自己尝试最快

[ 本帖最后由 fuumax 于 2006-12-30 17:02 编辑 ]

论坛徽章:
0
29 [报告]
发表于 2006-12-30 14:42 |只看该作者
在线等待fence方面的回答!

论坛徽章:
0
28 [报告]
发表于 2006-12-30 09:30 |只看该作者
添加Fence对于SW200e,选用什么端口?

论坛徽章:
0
27 [报告]
发表于 2006-12-29 19:18 |只看该作者
断网之后,sybase服务切换,切换一直不过去,看了第二台机器的日志信息如下:
Dec 29 17:30:21 web kernel: bnx2: eth1: using MSI
Dec 29 17:30:21 web kernel: bonding: bond0: enslaving eth1 as a backup interface with a down link.
Dec 29 17:30:21 web kernel: ip_tables: (C) 2000-2002 Netfilter core team
Dec 29 17:30:21 web kernel: bnx2: eth0: using MSI
Dec 29 17:30:21 web kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Dec 29 17:30:21 web kernel: bonding: bond0: link status definitely up for interface eth1.
Dec 29 17:30:21 web kernel: bonding: bond0: making interface eth1 the new active one.
Dec 29 17:30:21 web kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Dec 29 17:30:21 web kernel: ip_tables: (C) 2000-2002 Netfilter core team
Dec 29 17:30:21 web kernel: NET: Registered protocol family 10
Dec 29 17:30:21 web kernel: Disabled Privacy Extensions on device c0344160(lo)
Dec 29 17:30:21 web kernel: IPv6 over IPv4 tunneling driver
Dec 29 17:30:21 web kernel: CMAN 2.6.9-45.2 (built Jul 13 2006 11:42:36) installed
Dec 29 17:30:22 web kernel: NET: Registered protocol family 30
Dec 29 17:30:22 web kernel: DLM 2.6.9-42.10 (built Jul 13 2006 11:48:04) installed
Dec 29 17:30:22 web kernel: CMAN: Waiting to join or form a Linux-cluster
Dec 29 17:30:22 web kernel: CMAN: sending membership request
Dec 29 17:30:22 web kernel: CMAN: sending membership request
Dec 29 17:30:22 web kernel: CMAN: got node sybase
Dec 29 17:30:22 web kernel: CMAN: quorum regained, resuming activity
Dec 29 17:31:43 web fenced: startup succeeded
Dec 29 17:31:43 web kernel: Attached scsi generic sg0 at scsi0, channel 0, id 8, lun 0,  type 13
Dec 29 17:31:43 web kernel: Attached scsi generic sg1 at scsi0, channel 2, id 0, lun 0,  type 0
Dec 29 17:31:43 web kernel: Attached scsi generic sg2 at scsi1, channel 0, id 0, lun 0,  type 0
Dec 29 17:31:43 web kernel: Attached scsi generic sg3 at scsi1, channel 0, id 0, lun 1,  type 0
Dec 29 17:31:43 web kernel: Attached scsi generic sg4 at scsi1, channel 0, id 0, lun 2,  type 0
Dec 29 17:31:46 web Navisphere Agent[4243]: Agent initializing with pid 4243
Dec 29 17:31:46 web EV_AGENT[4254]: Agent daemon process created, pid 4254
Dec 29 17:31:46 web EV_AGENT[4254]: Agent has started up.
Dec 29 17:31:46 web naviagent: naviagent startup succeeded
Dec 29 17:31:46 web netfs: Mounting other filesystems:  succeeded
Dec 29 17:31:46 web kernel: i2c /dev entries driver
Dec 29 17:31:46 web rc: Starting lm_sensors:  succeeded
Dec 29 17:31:46 web autofs: automount startup succeeded
Dec 29 17:31:46 web smartd[4335]: smartd version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Dec 29 17:31:46 web smartd[4335]: Home page is [url]http://smartmontools.sourceforge.net/[/url]  
Dec 29 17:31:46 web smartd[4335]: Opened configuration file /etc/smartd.conf
Dec 29 17:31:46 web smartd[4335]: Configuration file /etc/smartd.conf parsed.
Dec 29 17:31:46 web smartd[4335]: Device: /dev/sda, opened
Dec 29 17:31:46 web smartd[4335]: Device: /dev/sda, Bad IEC (SMART) mode page, err=-5, skip device
Dec 29 17:31:46 web smartd[4335]: Unable to register SCSI device /dev/sda at line 30 of file /etc/smartd.conf
Dec 29 17:31:46 web smartd[4335]: Unable to register device /dev/sda (no Directive -d removable). Exiting.
Dec 29 17:31:46 web smartd: smartd startup failed
Dec 29 17:31:46 web acpid: acpid startup succeeded
Dec 29 17:31:47 web kernel: lp: driver loaded but no devices found
Dec 29 17:31:48 web cups: cupsd startup succeeded
Dec 29 17:31:48 web sshd:  succeeded
Dec 29 17:31:48 web xinetd: xinetd startup succeeded
Dec 29 17:31:48 web gpm[4420]: *** info [startup.c(95)]:
Dec 29 17:31:48 web gpm[4420]: Started gpm successfully. Entered daemon mode.
Dec 29 17:31:48 web xinetd[4410]: xinetd Version 2.3.13 started with libwrap loadavg options compiled in.
Dec 29 17:31:48 web xinetd[4410]: Started working: 0 available services
Dec 29 17:31:48 web gpm[4420]: *** info [mice.c(1766)]:
Dec 29 17:31:48 web gpm[4420]: imps2: Auto-detected intellimouse PS/2
Dec 29 17:31:48 web gpm: gpm startup succeeded
Dec 29 17:31:49 web iiim: htt startup succeeded
Dec 29 17:31:49 web crond: crond startup succeeded
Dec 29 17:31:49 web htt_server[4452]: started.
Dec 29 17:31:50 web xfs: xfs startup succeeded
Dec 29 17:31:50 web anacron: anacron startup succeeded
Dec 29 17:31:50 web atd: atd startup succeeded
Dec 29 17:31:50 web messagebus: messagebus startup succeeded
Dec 29 17:31:51 web cups-config-daemon: cups-config-daemon startup succeeded
Dec 29 17:31:51 web haldaemon: haldaemon startup succeeded
Dec 29 17:31:51 web clurgmgrd[4556]: <notice> Resource Group Manager Starting
Dec 29 17:31:51 web clurgmgrd[4556]: <info> Loading Service Data
Dec 29 17:31:51 web rgmanager: clurgmgrd startup succeeded
Dec 29 17:31:51 web fstab-sync[5131]: removed all generated mount points
Dec 29 17:31:51 web fstab-sync[5231]: added mount point /media/cdrom for /dev/hda
Dec 29 17:31:51 web clurgmgrd[4556]: <info> Initializing Services
Dec 29 17:31:51 web clurgmgrd: [4556]: <info> /dev/sdb1 is not mounted
Dec 29 17:31:51 web clurgmgrd: [4556]: <info> /dev/sdc1 is not mounted
Dec 29 17:31:51 web clurgmgrd: [4556]: <info> /dev/sdd1 is not mounted
Dec 29 17:31:52 web fstab-sync[5659]: added mount point /media/floppy for /dev/fd0
Dec 29 17:31:56 web clurgmgrd: [4556]: <info> Executing /etc/rc.d/init.d/sybaseHA.sh stop
Dec 29 17:31:56 web clurgmgrd: [4556]: <info> Executing /etc/rc.d/init.d/webHA.sh stop
Dec 29 17:31:56 web sybaseHA.sh: dataserver shutdown failed
Dec 29 17:31:56 web clurgmgrd[4556]: <notice> stop on script "cms-content" returned 5 (program not installed)
Dec 29 17:31:57 web clurgmgrd[4556]: <info> Services Initialized
Dec 29 17:31:58 web clurgmgrd[4556]: <info> Logged in SG "usrm::manager"
Dec 29 17:31:58 web clurgmgrd[4556]: <info> Magma Event: Membership Change
Dec 29 17:31:58 web clurgmgrd[4556]: <info> State change: Local UP
Dec 29 17:31:58 web clurgmgrd[4556]: <info> State change: sybase UP
Dec 29 17:31:59 web clurgmgrd[4556]: <info> Magma Event: Membership Change
Dec 29 17:31:59 web clurgmgrd[4556]: <info> State change: cms UP
Dec 29 17:32:01 web clurgmgrd[4556]: <notice> Starting stopped service webservice
Dec 29 17:32:01 web clurgmgrd: [4556]: <info> Adding IPv4 address 61.160.65.10 to eth0
Dec 29 17:32:02 web clurgmgrd: [4556]: <info> mounting /dev/sdc1 on /export/home/web
Dec 29 17:32:03 web kernel: kjournald starting.  Commit interval 5 seconds
Dec 29 17:32:03 web kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Dec 29 17:32:03 web kernel: EXT3 FS on sdc1, internal journal
Dec 29 17:32:03 web kernel: EXT3-fs: recovery complete.
Dec 29 17:32:03 web kernel: EXT3-fs: mounted filesystem with ordered data mode.
Dec 29 17:32:03 web clurgmgrd: [4556]: <info> Executing /etc/rc.d/init.d/webHA.sh start
Dec 29 17:33:20 web login(pam_unix)[4561]: session opened for user root by LOGIN(uid=0)
Dec 29 17:33:20 web  -- root[4561]: ROOT LOGIN ON tty1
Dec 29 17:34:29 web kernel: CMAN: removing node sybase from the cluster : Missed too many heartbeats
Dec 29 17:34:29 web fenced[4117]: sybase not a cluster member after 0 sec post_fail_delay
Dec 29 17:34:29 web fenced[4117]: fencing node "sybase"
Dec 29 17:34:31 web fenced[4117]: agent "fence_brocade" reports: failed: portshow 80 does not show DISABLED  
Dec 29 17:34:31 web fenced[4117]: fence "sybase" failed
Dec 29 17:34:36 web fenced[4117]: fencing node "sybase"
Dec 29 17:34:37 web fenced[4117]: agent "fence_brocade" reports: failed: portshow 80 does not show DISABLED  
Dec 29 17:34:37 web fenced[4117]: fence "sybase" failed
Dec 29 17:34:42 web fenced[4117]: fencing node "sybase"
Dec 29 17:34:44 web fenced[4117]: agent "fence_brocade" reports: failed: portshow 80 does not show DISABLED  
Dec 29 17:34:44 web fenced[4117]: fence "sybase" failed
Dec 29 17:34:49 web fenced[4117]: fencing node "sybase"
Dec 29 17:34:51 web fenced[4117]: agent "fence_brocade" reports: failed: portshow 80 does not show DISABLED  
Dec 29 17:34:51 web fenced[4117]: fence "sybase" failed
Dec 29 17:34:56 web fenced[4117]: fencing node "sybase"
Dec 29 17:34:57 web fenced[4117]: agent "fence_brocade" reports: failed: portshow 80 does not show DISABLED  
Dec 29 17:34:57 web fenced[4117]: fence "sybase" failed
Dec 29 17:35:02 web fenced[4117]: fencing node "sybase"

123.JPG (87.99 KB, 下载次数: 43)

123.JPG

论坛徽章:
0
26 [报告]
发表于 2006-12-28 16:59 |只看该作者
通过man看的东西太混乱,有没有直接说如何设置共享的仲裁空间的?

论坛徽章:
0
25 [报告]
发表于 2006-12-28 16:50 |只看该作者
You can find more information about Quorum Disk in the following man
  pages: mkqdisk(, qdiskd(, and qdisk(5).
  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP