免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3417 | 回复: 8
打印 上一主题 下一主题

利用rhcs 4.3 做集群所遇到的问题? [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-10-08 11:30 |只看该作者 |倒序浏览
os :redhat linux 4.3
cluster:cluster 4.3


手工切换的时候,由于在b机上不能umount某个共享的文件系统,导致不能进行接管?

为什么啊?

没有使用gfs

cluster.conf文件

<?xml version="1.0"?>
<cluster config_version="10" name="alpha_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="mainserver1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="mainserver2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="test"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="oracle" ordered="1" restricted="1">
                                <failoverdomainnode name="mainserver1" priority="1"/>
                                <failoverdomainnode name="mainserver2" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="138.148.221.3" monitor_link="1"/>
                        <fs device="/dev/mapper/VolGroupArray-lv_oracle_data" force_fsck="0" force_unmount="1" fsid="15674" fstype="ext3" mountpoint="/data/oradata" name="oradata" options="" self_fence="1"/>
                        <fs device="/dev/mapper/VolGroupArray-lv_oracle_log" force_fsck="0" force_unmount="1" fsid="36402" fstype="ext3" mountpoint="/data/oralog" name="oralog" options="" self_fence="0"/>
                        <fs device="/dev/mapper/VolGroupArray-lv_images" force_fsck="0" force_unmount="1" fsid="48746" fstype="ext3" mountpoint="/data/images" name="images" options="" self_fence="0"/>
                        <script file="/etc/init.d/hongsy.sh" name="orace"/>
                        <script file="/etc/init.d/cluster_svr" name="cluster_svr"/>
                </resources>
                <service autostart="1" domain="oracle" name="oracle">
                        <ip ref="138.148.221.3"/>
                        <fs ref="oradata"/>
                        <fs ref="oralog"/>
                        <fs ref="images"/>
                        <script ref="orace"/>
                </service>
        </rm>
</cluster>




相关操作系统日志

Sep 28 03:24:46 mainserver1 ccsd[6494]: Starting ccsd 1.0.3:
Sep 28 03:24:46 mainserver1 ccsd[6494]:  Built: Jan 25 2006 16:54:55
Sep 28 03:24:46 mainserver1 ccsd[6494]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Sep 28 03:24:47 mainserver1 ccsd: startup succeeded
Sep 28 03:24:52 mainserver1 kernel: CMAN 2.6.9-43.8 (built Feb 26 2006 21:06:1 installed
Sep 28 03:24:52 mainserver1 kernel: NET: Registered protocol family 30
Sep 28 03:24:52 mainserver1 ccsd[6494]: cluster.conf (cluster name = alpha_cluster, version = found.
Sep 28 03:24:52 mainserver1 ccsd[6494]: Remote copy of cluster.conf is from quorate node.
Sep 28 03:24:52 mainserver1 ccsd[6494]:  Local version # : 8
Sep 28 03:24:52 mainserver1 ccsd[6494]:  Remote version #: 9
Sep 28 03:24:52 mainserver1 ccsd[6494]: Switching to remote copy.
Sep 28 03:24:52 mainserver1 kernel: CMAN: Waiting to join or form a Linux-cluster
Sep 28 03:24:52 mainserver1 ccsd[6494]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.5
Sep 28 03:24:52 mainserver1 ccsd[6494]: Initial status:: Inquorate
Sep 28 03:24:52 mainserver1 kernel: CMAN: sending membership request
Sep 28 03:24:52 mainserver1 kernel: CMAN: got node mainserver2
Sep 28 03:24:52 mainserver1 kernel: CMAN: quorum regained, resuming activity
Sep 28 03:24:52 mainserver1 ccsd[6494]: Cluster is quorate.  Allowing connections.
Sep 28 03:24:53 mainserver1 kernel: DLM 2.6.9-41.7 (built Feb 26 2006 21:30:10) installed
Sep 28 03:24:53 mainserver1 cman: startup succeeded
Sep 28 03:24:59 mainserver1 fenced: startup succeeded
Sep 28 03:25:13 mainserver1 lock_gulmd: no <gulm> section detected in /etc/cluster/cluster.conf succeeded
Sep 28 03:25:18 mainserver1 clurgmgrd[6580]: <notice> Resource Group Manager Starting
Sep 28 03:25:18 mainserver1 clurgmgrd[6580]: <info> Loading Service Data
Sep 28 03:25:18 mainserver1 rgmanager: clurgmgrd 启动 succeeded
Sep 28 03:25:19 mainserver1 clurgmgrd[6580]: <info> Initializing Services
Sep 28 03:25:19 mainserver1 clurgmgrd: [6580]: <info> Executing /etc/init.d/hongsy.sh stop
Sep 28 03:25:19 mainserver1 su(pam_unix)[6691]: session opened for user oracle by (uid=0)
Sep 28 03:25:19 mainserver1 su(pam_unix)[6691]: session closed for user oracle
Sep 28 03:25:19 mainserver1 su(pam_unix)[6737]: session opened for user oracle by (uid=0)
Sep 28 03:25:22 mainserver1 su(pam_unix)[6737]: session closed for user oracle
Sep 28 03:25:22 mainserver1 clurgmgrd: [6580]: <info> /dev/mapper/VolGroupArray-lv_oracle_data is not mounted
Sep 28 03:25:24 mainserver1 clurgmgrd: [6580]: <info> /dev/mapper/VolGroupArray-lv_oracle_log is not mounted
Sep 28 03:25:26 mainserver1 clurgmgrd: [6580]: <info> /dev/mapper/VolGroupArray-lv_images is not mounted
Sep 28 03:25:28 mainserver1 clurgmgrd[6580]: <info> Services Initialized
Sep 28 03:25:28 mainserver1 clurgmgrd[6580]: <info> Logged in SG "usrm::manager"
Sep 28 03:25:28 mainserver1 clurgmgrd[6580]: <info> Magma Event: Membership Change
Sep 28 03:25:28 mainserver1 clurgmgrd[6580]: <info> State change: Local UP
Sep 28 03:25:30 mainserver1 clurgmgrd[6580]: <notice> Starting stopped service oracle
Sep 28 03:25:30 mainserver1 clurgmgrd: [6580]: <info> mounting /dev/mapper/VolGroupArray-lv_oracle_data on /data/oradata
Sep 28 03:25:30 mainserver1 kernel: kjournald starting.  Commit interval 5 seconds
Sep 28 03:25:30 mainserver1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Sep 28 03:25:30 mainserver1 kernel: EXT3 FS on dm-2, internal journal
Sep 28 03:25:30 mainserver1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 28 03:25:30 mainserver1 clurgmgrd: [6580]: <info> mounting /dev/mapper/VolGroupArray-lv_oracle_log on /data/oralog
Sep 28 03:25:31 mainserver1 kernel: kjournald starting.  Commit interval 5 seconds
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs warning: checktime reached, running e2fsck is recommended
Sep 28 03:25:31 mainserver1 kernel: EXT3 FS on dm-4, internal journal
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs: recovery complete.
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 28 03:25:31 mainserver1 clurgmgrd: [6580]: <info> mounting /dev/mapper/VolGroupArray-lv_images on /data/images
Sep 28 03:25:31 mainserver1 kernel: kjournald starting.  Commit interval 5 seconds
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs warning: mounting unchecked fs, running e2fsck is recommended
Sep 28 03:25:31 mainserver1 kernel: EXT3 FS on dm-3, internal journal
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 28 03:25:31 mainserver1 clurgmgrd: [6580]: <info> Adding IPv4 address 138.148.221.3 to eth2
Sep 28 03:25:32 mainserver1 clurgmgrd: [6580]: <info> Executing /etc/init.d/hongsy.sh start
Sep 28 03:25:32 mainserver1 su(pam_unix)[7120]: session opened for user oracle by (uid=0)
Sep 28 03:25:42 mainserver1 clurgmgrd[6580]: <info> Magma Event: Membership Change
Sep 28 03:25:42 mainserver1 clurgmgrd[6580]: <info> State change: mainserver2 UP
Sep 28 03:27:16 mainserver1 kernel: oracle(719: floating-point assist fault at ip 4000000009f5e0e2, isr 0000020000001001
Sep 28 03:27:16 mainserver1 last message repeated 3 times
Sep 28 03:27:17 mainserver1 su(pam_unix)[7120]: session closed for user oracle
Sep 28 03:27:17 mainserver1 su(pam_unix)[7207]: session opened for user oracle by (uid=0)
Sep 28 03:27:19 mainserver1 su(pam_unix)[7207]: session closed for user oracle
Sep 28 03:27:26 mainserver1 clurgmgrd[6580]: <notice> Service oracle started
Sep 28 03:28:02 mainserver1 clurgmgrd: [6580]: <info> Executing /etc/init.d/hongsy.sh status
Sep 28 03:28:04 mainserver1 su(pam_unix)[7820]: session opened for user oracle by root(uid=0)
Sep 28 03:28:25 mainserver1 su(pam_unix)[7820]: session closed for user oracle
Sep 28 03:28:32 mainserver1 clurgmgrd: [6580]: <info> Executing /etc/init.d/hongsy.sh status
Sep 28 03:29:33 mainserver1 last message repeated 2 times
Sep 28 03:29:37 mainserver1 clurgmgrd[6580]: <notice> Stopping service oracle
Sep 28 03:29:37 mainserver1 clurgmgrd: [6580]: <info> Executing /etc/init.d/hongsy.sh stop
Sep 28 03:29:40 mainserver1 su(pam_unix)[8927]: session opened for user oracle by (uid=0)
Sep 28 03:29:41 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:29:52 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:29:55 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:29:57 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:29:59 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:30:01 mainserver1 crond(pam_unix)[8975]: session opened for user root by (uid=0)
Sep 28 03:30:01 mainserver1 crond(pam_unix)[8974]: session opened for user root by (uid=0)
Sep 28 03:30:01 mainserver1 su(pam_unix)[8976]: session opened for user oracle by (uid=0)
Sep 28 03:30:09 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:30:10 mainserver1 su(pam_unix)[8927]: session closed for user oracle
Sep 28 03:30:10 mainserver1 su(pam_unix)[9008]: session opened for user oracle by (uid=0)
Sep 28 03:30:10 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:30:11 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:30:13 mainserver1 su(pam_unix)[9008]: session closed for user oracle
Sep 28 03:30:14 mainserver1 clurgmgrd: [6580]: <info> Removing IPv4 address 138.148.221.3 from eth2
Sep 28 03:30:15 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:30:22 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:30:23 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:30:26 mainserver1 clurgmgrd: [6580]: <info> unmounting /data/oradata
Sep 28 03:30:27 mainserver1 clurgmgrd: [6580]: <info> unmounting /data/oralog
Sep 28 03:30:27 mainserver1 clurgmgrd: [6580]: <notice> Forcefully unmounting /data/oralog
Sep 28 03:30:28 mainserver1 clurgmgrd: [6580]: <warning> killing process 4644 (root gam_serve /data/oralog)
Sep 28 03:30:29 mainserver1 clurgmgrd: [6580]: <crit> Could not clean up mountpoint /data/oralog
Sep 28 03:30:32 mainserver1 su(pam_unix)[8976]: session closed for user oracle
Sep 28 03:30:34 mainserver1 clurgmgrd: [6580]: <info> unmounting /data/oralog
Sep 28 03:30:34 mainserver1 clurgmgrd: [6580]: <notice> Forcefully unmounting /data/oralog
Sep 28 03:30:34 mainserver1 clurgmgrd: [6580]: <warning> killing process 9203 (root gam_serve /data/oralog)
Sep 28 03:30:35 mainserver1 clurgmgrd: [6580]: <crit> Could not clean up mountpoint /data/oralog
Sep 28 03:30:35 mainserver1 clurgmgrd: [6580]: <err> 'umount /data/oralog' failed, error=0
Sep 28 03:30:35 mainserver1 clurgmgrd[6580]: <notice> stop on fs "oralog" returned 2 (invalid argument(s))
Sep 28 03:30:35 mainserver1 clurgmgrd[6580]: <crit> #12: RG oracle failed to stop; intervention required
Sep 28 03:30:35 mainserver1 clurgmgrd[6580]: <notice> Service oracle is failed
Sep 28 03:30:36 mainserver1 clurgmgrd[6580]: <warning> #70: Attempting to restart service oracle locally.
Sep 28 03:30:36 mainserver1 clurgmgrd[6580]: <err> #43: Service oracle has failed; can not start.
Sep 28 03:30:36 mainserver1 clurgmgrd[6580]: <alert> #2: Service oracle returned failure code.  Last Owner: mainserver1
Sep 28 03:30:36 mainserver1 clurgmgrd[6580]: <alert> #4: Administrator intervention required.
Sep 28 03:30:44 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:30:47 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:31:01 mainserver1 crond(pam_unix)[8974]: session closed for user root
Sep 28 03:31:01 mainserver1 crond(pam_unix)[8975]: session closed for user root
Sep 28 03:31:33 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:31:33 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:31:36 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:31:42 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:32:36 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 1)
Sep 28 03:33:08 mainserver1 htt_server[3175]: status has not been enabled yet. (1, 2)
Sep 28 03:38:11 mainserver1 kernel: clurgmgrd(9343): unaligned access to 0x2000000001b2e904, ip=0x4000000000010091
Sep 28 03:38:11 mainserver1 kernel: clurgmgrd(9343): unaligned access to 0x2000000001b2e904, ip=0x40000000000100b0
Sep 28 03:38:11 mainserver1 kernel: clurgmgrd(9343): unaligned access to 0x2000000001b2e90c, ip=0x40000000000100f1
Sep 28 03:38:11 mainserver1 kernel: clurgmgrd(9343): unaligned access to 0x2000000001b2e90c, ip=0x4000000000010110


请兄弟们说道说道?

论坛徽章:
0
2 [报告]
发表于 2008-10-08 12:52 |只看该作者
对于oracle来说,如果进程没有完全关闭,即可能还有访问挂载点的进程没有被终止,这个时候就会造成umount失败。
可以在umount的失败的时候通过命令lsof看看是否有访问挂载点的进程。
我想通常都是服务脚本导致这个问题。你可以尝试用系统自带脚本去比较一下以定位是否自定义脚本的错误。

论坛徽章:
0
3 [报告]
发表于 2008-10-08 13:43 |只看该作者
在redhat的init 5图形模式下,存在一个进程gam_server,它会一直监控文件改动,从而修改界面上的一些状态(比如回收站的显示)。
但是在gnome存在一个bug,因为进程nautilus会调用gam_server的接口来监控所有文件加,这样当一个分区被mount,且有文件时,会一直被gam_server访问。这时候分区将不能被umount,即使杀掉gam_server,因为nautilus会迅速重起gam_server。这样HA切换服务时将不能umount共享分区。

解决方法:

1、不使用gnome做为窗口,使用kde

2、创建 /etc/gamin/gaminrc
把所有共享分区的mount point写道notify 后面
例如
notify /oradata* /opt/oracle*

然后重起xwindows即可

第二步不是必须的,redhat建议这么做,但是我测试没有生效,使用kde可以避免该问题的发生。当然,能说服用户平时不用图形界面最好,需要管理的时候在启动图形界面即可。

论坛徽章:
0
4 [报告]
发表于 2008-10-08 15:26 |只看该作者
这个与脚本有关吗

如果你们仔细看的话

我的service中共涉及到3个挂载点

/data/oradata
/data/oralog
/data/images

但是除了/data/oralog不能正常umount之外,其它的两个都可以啊!

问题很怪,不好理解?再请各位指导指导下?、

我在虚拟机上测试  好好地 没有任何问题啊

论坛徽章:
0
5 [报告]
发表于 2008-10-08 15:27 |只看该作者
to  ljhb


如果将服务器运行在模式3下
gam_serve 这个进行还会存在吗?
需要测试下

论坛徽章:
0
6 [报告]
发表于 2008-10-08 19:21 |只看该作者
楼主请先确定一下你的那个 .sh 能不能完全关闭 oracle。

论坛徽章:
0
7 [报告]
发表于 2008-10-08 23:53 |只看该作者
原帖由 西方 于 2008-10-8 15:26 发表
这个与脚本有关吗

如果你们仔细看的话

我的service中共涉及到3个挂载点

/data/oradata
/data/oralog
/data/images

但是除了/data/oralog不能正常umount之外,其它的两个都可以啊!

问题很怪, ...


那你还是要找找什么在访问那个不能被卸掉的目录吧?至少lsof要执行一下的吧?!

论坛徽章:
0
8 [报告]
发表于 2008-10-13 13:52 |只看该作者
Sep 28 03:30:34 mainserver1 clurgmgrd: [6580]: <notice> Forcefully unmounting /data/oralog
Sep 28 03:30:34 mainserver1 clurgmgrd: [6580]: <warning> killing process 9203 (root gam_serve /data/oralog)
Sep 28 03:30:35 mainserver1 clurgmgrd: [6580]: <crit> Could not clean up mountpoint /data/oralog
Sep 28 03:30:35 mainserver1 clurgmgrd: [6580]: <err> 'umount /data/oralog' failed, error=0


这个中表明是gam_serve在作怪啊

论坛徽章:
0
9 [报告]
发表于 2008-10-13 15:22 |只看该作者
原帖由 西方 于 2008-10-13 13:52 发表
Sep 28 03:30:34 mainserver1 clurgmgrd: [6580]:  Forcefully unmounting /data/oralog
Sep 28 03:30:34 mainserver1 clurgmgrd: [6580]:  killing process 9203 (root gam_serve /data/oralog)
Sep 28 03 ...



未必。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP