免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 11006 | 回复: 12
打印 上一主题 下一主题

求助!RHCS不稳定的怪异问题 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-06-13 16:15 |只看该作者 |倒序浏览
本帖最后由 liuyongsd 于 2011-06-13 17:21 编辑

日前配置了一套RHCS的双机。OS版本RHEL 5.4。配置完成后,运行正常,但每隔5天左右,双机就会down掉。从日志中看,是由于找不到心跳所致。但配置时,为了避免心跳丢失,特意做了bonding。

cluster.conf配置文件如下:
<?xml version="1.0"?>
<cluster alias="new_cluster" config_version="20" name="new_cluster">
  <totem token="80000"/>   
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="clusternode01" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="1" name="rlerpdb"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="clusternode02" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="1" name="rlerpci"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="2" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="" ipaddr="172.16.63.112" login="Administrator" name="node1" passwd="password"/>
                <fencedevice agent="fence_ipmilan" auth="" ipaddr="172.16.63.113" login="Administrator" name="node2" passwd="password"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="ipdomain" ordered="1" restricted="0">
                                <failoverdomainnode name="clusternode01" priority="2"/>
                                <failoverdomainnode name="clusternode02" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="dbdomain" ordered="1" restricted="0">
                                <failoverdomainnode name="clusternode01" priority="1"/>
                                <failoverdomainnode name="clusternode02" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="172.16.45.204" monitor_link="1"/>
                        <ip address="172.16.45.203" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="ipdomain" name="ascs" recovery="relocate">
                        <ip ref="172.16.45.204">
                                <script file="/usr/scripts/ascs" name="ascs"/>
                        </ip>
                </service>
                <service autostart="1" domain="dbdomain" name="db" recovery="relocate">
                        <fs device="/dev/mapper/oracle_vg-oraclevol" force_fsck="0" force_unmount="1" fsid="64480" fstype="ext3" mountpoint="/oracle" name="oracle" opti
ons="" self_fence="0">
                                <fs device="/dev/mapper/data_vg-data01vol" force_fsck="0" force_unmount="1" fsid="11430" fstype="ext3" mountpoint="/oracle/RLP/sapdata1"
name="data1" options="" self_fence="0"/>
                                <fs device="/dev/mapper/data_vg-data02vol" force_fsck="0" force_unmount="1" fsid="32028" fstype="ext3" mountpoint="/oracle/RLP/sapdata2"
name="data2" options="" self_fence="0"/>
                                <fs device="/dev/mapper/data_vg-data03vol" force_fsck="0" force_unmount="1" fsid="19850" fstype="ext3" mountpoint="/oracle/RLP/sapdata3"
name="data3" options="" self_fence="0">
                                        <ip ref="172.16.45.203">
                                                <script file="/usr/scripts/db" name="db"/>
                                        </ip>
                                </fs>
                        </fs>
                </service>
        </rm>
</cluster>

启动时间约在Jun 13 14:30,两台主机均重启了,可明明把心跳做了bonding,按说实在不应该啊。

论坛徽章:
0
2 [报告]
发表于 2011-06-13 16:20 |只看该作者
node1 log:
Jun 13 14:32:59 clusternode1 syslogd 1.4.1: restart (remote reception).
Jun 13 14:32:59 clusternode1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun 13 14:32:59 clusternode1 kernel: Linux version 2.6.18-164.el5 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Aug
18 15:51:48 EDT 2009
Jun 13 14:32:59 clusternode1 kernel: Command line: ro root=LABEL=/ rhgb quiet
Jun 13 14:32:59 clusternode1 kernel: BIOS-provided physical RAM map:
Jun 13 14:32:59 clusternode1 kernel:  BIOS-e820: 0000000000010000 - 000000000009dc00 (usable)
Jun 13 14:32:59 clusternode1 kernel:  BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
Jun 13 14:32:59 clusternode1 kernel:  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Jun 13 14:32:59 clusternode1 kernel:  BIOS-e820: 0000000000100000 - 00000000bf620000 (usable)
Jun 13 14:32:59 clusternode1 kernel:  BIOS-e820: 00000000bf620000 - 00000000bf63c000 (ACPI data)
Jun 13 14:32:59 clusternode1 kernel:  BIOS-e820: 00000000bf63c000 - 00000000bf63d000 (usable)
................(此处省略启动信息)
Jun 13 14:44:07 clusternode1 ccsd[8251]: Starting ccsd 2.0.115:
Jun 13 14:44:07 clusternode1 ccsd[8251]:  Built: Aug  5 2009 08:24:53
Jun 13 14:44:07 clusternode1 ccsd[8251]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Jun 13 14:44:07 clusternode1 ccsd[8251]: cluster.conf (cluster name = new_cluster, version = 213) found.
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] AIS Executive Service: started and ready to provide service.
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] Using default multicast address of 239.192.220.17
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Token Timeout (80000 ms) retransmit timeout (3960 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] token hold (3158 ms) retransmits before loss (20 retrans)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] send threads (0 threads)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] RRP token expired timeout (3960 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] RRP token problem counter (2000 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] RRP threshold (10 problem count)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] RRP mode set to none.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] heartbeat_failures_allowed (0)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] max_network_delay (50 ms)
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] The network interface [10.0.0.203] is now up.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Created or loaded sequence id 32.10.0.0.203 for this ring.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] entering GATHER state from 15.
Jun 13 14:44:09 clusternode1 openais[8260]: [CMAN ] CMAN 2.0.115 (built Aug  5 2009 08:24:57) started
Jun 13 14:44:09 clusternode1 openais[8260]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais event service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais message service B.01.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais configuration service'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Jun 13 14:44:09 clusternode1 openais[8260]: [SYNC ] Not using a virtual synchrony filter.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Creating commit token because I am the rep.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Saving state aru 0 high seq received 0
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Storing new sequence id for ring 24
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] entering COMMIT state.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] entering RECOVERY state.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] position [0] member 10.0.0.203:
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] previous ring seq 32 rep 10.0.0.203
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] aru 0 high delivered 0 received flag 1
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Did not need to originate any messages in recovery.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] Sending initial ORF token
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] New Configuration:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] Members Left:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] Members Joined:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] New Configuration:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ]       r(0) ip(10.0.0.203)  
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] Members Left:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] Members Joined:
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ]       r(0) ip(10.0.0.203)  
Jun 13 14:44:09 clusternode1 openais[8260]: [SYNC ] This node is within the primary component and will provide service.
Jun 13 14:44:09 clusternode1 openais[8260]: [TOTEM] entering OPERATIONAL state.
Jun 13 14:44:09 clusternode1 openais[8260]: [CMAN ] quorum regained, resuming activity
Jun 13 14:44:09 clusternode1 openais[8260]: [CLM  ] got nodejoin message 10.0.0.203
Jun 13 14:44:09 clusternode1 ccsd[8251]: Cluster is not quorate.  Refusing connection.
Jun 13 14:44:09 clusternode1 ccsd[8251]: Error while processing connect: Connection refused
Jun 13 14:44:10 clusternode1 ccsd[8251]: Initial status:: Quorate
Jun 13 14:44:12 clusternode1 qdiskd[8020]: <info> Quorum Daemon Initializing
Jun 13 14:44:12 clusternode1 qdiskd[8020]: <crit> Initialization failed
Jun 13 14:45:00 clusternode1 fenced[8281]: rlerpprdcihb01 not a cluster member after 3 sec post_join_delay
Jun 13 14:45:00 clusternode1 fenced[8281]: fencing node "rlerpprdcihb01"
Jun 13 14:45:09 clusternode1 fenced[8281]: fence "rlerpprdcihb01" success
Jun 13 14:45:18 clusternode1 kernel: dlm: Using TCP for communications
Jun 13 14:45:19 clusternode1 clvmd: Cluster LVM daemon started - connected to CMAN
Jun 13 14:45:19 clusternode1 multipathd: dm-7: add map (uevent)
Jun 13 14:45:19 clusternode1 multipathd: dm-8: add map (uevent)
Jun 13 14:45:19 clusternode1 multipathd: dm-9: add map (uevent)
Jun 13 14:45:19 clusternode1 multipathd: dm-10: add map (uevent)
Jun 13 14:45:20 clusternode1 multipathd: dm-11: add map (uevent)
Jun 13 14:45:20 clusternode1 multipathd: dm-12: add map (uevent)
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "new_cluster:sapmnt"
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: Joined cluster. Now mounting FS...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=0, already locked for use
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=0: Looking at journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=0: Done
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Trying to acquire journal lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Looking at journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Acquiring the transaction lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Replaying journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Replayed 2 of 2 blocks
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Found 0 revoke tags
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Journal replayed in 1s
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=1: Done
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=2: Trying to acquire journal lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=2: Looking at journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=2: Done
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=3: Trying to acquire journal lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=3: Looking at journal...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=3: Done
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=4: Trying to acquire journal lock...
Jun 13 14:45:25 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=4: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=4: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=5: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=5: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=5: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=6: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=6: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=6: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=7: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=7: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=7: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=8: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=8: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=8: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=9: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=9: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:sapmnt.0: jid=9: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "new_cluster:ascs00"
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: Joined cluster. Now mounting FS...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=0, already locked for use
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=0: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=0: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Acquiring the transaction lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Replaying journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Replayed 5 of 5 blocks
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Found 0 revoke tags
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Journal replayed in 1s
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=1: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=2: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=2: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=2: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=3: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=3: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=3: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=4: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=4: Looking at journal...

论坛徽章:
0
3 [报告]
发表于 2011-06-13 16:23 |只看该作者
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=4: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=5: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=5: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=5: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=6: Trying to acquire journal lock...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=6: Looking at journal...
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=6: Done
Jun 13 14:45:26 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=7: Trying to acquire journal lock...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=7: Looking at journal...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=7: Done
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=8: Trying to acquire journal lock...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=8: Looking at journal...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=8: Done
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=9: Trying to acquire journal lock...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=9: Looking at journal...
Jun 13 14:45:27 clusternode1 kernel: GFS2: fsid=new_cluster:ascs00.0: jid=9: Done
Jun 13 14:45:37 clusternode1 clurgmgrd[8581]: <notice> Resource Group Manager Starting
Jun 13 14:45:50 clusternode1 clurgmgrd[8581]: <notice> Starting stopped service service:ascs
Jun 13 14:45:50 clusternode1 clurgmgrd[8581]: <notice> Starting stopped service service:db
Jun 13 14:45:50 clusternode1 avahi-daemon[7413]: Registering new address record for 172.16.45.204 on eth1.
Jun 13 14:45:50 clusternode1 kernel: kjournald starting.  Commit interval 5 seconds
Jun 13 14:45:50 clusternode1 kernel: EXT3-fs warning: mounting unchecked fs, running e2fsck is recommended
Jun 13 14:45:50 clusternode1 kernel: EXT3 FS on dm-7, internal journal
Jun 13 14:45:50 clusternode1 kernel: EXT3-fs: dm-7: 1 orphan inode deleted
Jun 13 14:45:50 clusternode1 kernel: EXT3-fs: recovery complete.
Jun 13 14:45:50 clusternode1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:45:51 clusternode1 kernel: kjournald starting.  Commit interval 5 seconds
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:45:51 clusternode1 kernel: EXT3 FS on dm-9, internal journal
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: recovery complete.
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:45:51 clusternode1 kernel: kjournald starting.  Commit interval 5 seconds
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:45:51 clusternode1 kernel: EXT3 FS on dm-8, internal journal
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: recovery complete.
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:45:51 clusternode1 kernel: kjournald starting.  Commit interval 5 seconds
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:45:51 clusternode1 kernel: EXT3 FS on dm-10, internal journal
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: recovery complete.
Jun 13 14:45:51 clusternode1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:45:51 clusternode1 avahi-daemon[7413]: Registering new address record for 172.16.45.203 on eth1.
Jun 13 14:45:53 clusternode1 SAPRLP_00[11393]: SAP Service SAPRLP_00 successfully started.
Jun 13 14:46:14 clusternode1 SAPRLP_01[12265]: SAP Service SAPRLP_01 successfully started.
Jun 13 14:46:34 clusternode1 kernel: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.eth1.base_reachable_time; Use net.ipv6.neigh.eth1.base_reachable
_time_ms instead.
Jun 13 14:46:42 clusternode1 SAPRLP_01[12902]: Unable to open trace file sapstartsrv.log. (Error 11 Resource temporarily unavailable) [ntservsserver.cpp 2218]
Jun 13 14:46:54 clusternode1 clurgmgrd[8581]: <notice> Service service:db started
Jun 13 14:47:22 clusternode1 clurgmgrd[8581]: <notice> Service service:ascs started
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] entering GATHER state from 11.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Creating commit token because I am the rep.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Saving state aru 39 high seq received 39
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Storing new sequence id for ring 28
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] entering COMMIT state.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] entering RECOVERY state.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] position [0] member 10.0.0.203:
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] previous ring seq 36 rep 10.0.0.203
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] aru 39 high delivered 39 received flag 1
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] position [1] member 10.0.0.204:
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] previous ring seq 36 rep 10.0.0.204
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] aru 0 high delivered 0 received flag 1
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Did not need to originate any messages in recovery.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] Sending initial ORF token
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] New Configuration:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ]       r(0) ip(10.0.0.203)  
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] Members Left:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] Members Joined:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] New Configuration:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ]       r(0) ip(10.0.0.203)  
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ]       r(0) ip(10.0.0.204)  
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] Members Left:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] Members Joined:
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ]       r(0) ip(10.0.0.204)  
Jun 13 14:54:42 clusternode1 openais[8260]: [SYNC ] This node is within the primary component and will provide service.
Jun 13 14:54:42 clusternode1 openais[8260]: [TOTEM] entering OPERATIONAL state.
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] got nodejoin message 10.0.0.203
Jun 13 14:54:42 clusternode1 openais[8260]: [CLM  ] got nodejoin message 10.0.0.204
Jun 13 14:54:42 clusternode1 openais[8260]: [CPG  ] got joinlist message from node 1
Jun 13 14:54:57 clusternode1 kernel: dlm: connecting to 2
Jun 13 14:56:38 clusternode1 clurgmgrd[8581]: <notice> Relocating service:ascs to better node rlerpprdcihb01
Jun 13 14:56:38 clusternode1 clurgmgrd[8581]: <notice> Stopping service service:ascs
Jun 13 14:57:17 clusternode1 avahi-daemon[7413]: Withdrawing address record for 172.16.45.204 on eth1.
Jun 13 14:57:23 clusternode1 SAPRLP_01[20972]: Unable to open trace file sapstartsrv.log. (Error 11 Resource temporarily unavailable) [ntservsserver.cpp 2218]
Jun 13 14:57:27 clusternode1 clurgmgrd[8581]: <notice> Service service:ascs is stopped
Jun 13 14:59:27 clusternode1 gdm[22061]: 无法认证用户

论坛徽章:
0
4 [报告]
发表于 2011-06-13 16:24 |只看该作者
node 2 log信息:

Jun 12 04:03:02 clusternode2 syslogd 1.4.1: restart (remote reception).
Jun 13 14:28:49 clusternode2 openais[19156]: [TOTEM] The token was lost in the OPERATIONAL state.
Jun 13 14:28:49 clusternode2 openais[19156]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Jun 13 14:28:49 clusternode2 openais[19156]: [TOTEM] Transmit multicast socket send buffer size (288000 bytes).
Jun 13 14:28:49 clusternode2 openais[19156]: [TOTEM] entering GATHER state from 2.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] entering GATHER state from 0.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Creating commit token because I am the rep.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Saving state aru 9b high seq received 9b
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Storing new sequence id for ring 24
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] entering COMMIT state.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] entering RECOVERY state.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] position [0] member 10.0.0.204:
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] previous ring seq 32 rep 10.0.0.203
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] aru 9b high delivered 9b received flag 1
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Did not need to originate any messages in recovery.
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] Sending initial ORF token
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] New Configuration:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ]      r(0) ip(10.0.0.204)  
Jun 13 14:28:54 clusternode2 kernel: dlm: closing connection to node 1
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] Members Left:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ]      r(0) ip(10.0.0.203)  
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] Members Joined:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] New Configuration:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ]      r(0) ip(10.0.0.204)  
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] Members Left:
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] Members Joined:
Jun 13 14:28:54 clusternode2 fenced[19176]: clusternode1 not a cluster member after 0 sec post_fail_delay
Jun 13 14:28:54 clusternode2 openais[19156]: [SYNC ] This node is within the primary component and will provide service.
Jun 13 14:28:54 clusternode2 fenced[19176]: fencing node "clusternode1"
Jun 13 14:28:54 clusternode2 openais[19156]: [TOTEM] entering OPERATIONAL state.
Jun 13 14:28:54 clusternode2 openais[19156]: [CLM  ] got nodejoin message 10.0.0.204
Jun 13 14:28:54 clusternode2 openais[19156]: [CPG  ] got joinlist message from node 2
Jun 13 14:29:03 clusternode2 fenced[19176]: fence "clusternode1" success
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Trying to acquire journal lock...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Trying to acquire journal lock...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Looking at journal...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Looking at journal...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Acquiring the transaction lock...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Replaying journal...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Replayed 2 of 2 blocks
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Found 0 revoke tags
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Journal replayed in 1s
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=0: Done
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Acquiring the transaction lock...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Replaying journal...
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Replayed 5 of 8 blocks
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Found 2 revoke tags
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Journal replayed in 0s
Jun 13 14:29:03 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=0: Done
Jun 13 14:29:03 clusternode2 clurgmgrd[19449]: <notice> Taking over service service:db from down member clusternode1
Jun 13 14:29:04 clusternode2 kernel: kjournald starting.  Commit interval 5 seconds
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs warning: mounting unchecked fs, running e2fsck is recommended
Jun 13 14:29:04 clusternode2 kernel: EXT3 FS on dm-7, internal journal
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: dm-7: 1 orphan inode deleted
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: recovery complete.
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:29:04 clusternode2 kernel: kjournald starting.  Commit interval 5 seconds
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:29:04 clusternode2 kernel: EXT3 FS on dm-9, internal journal
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: recovery complete.
Jun 13 14:29:04 clusternode2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:29:05 clusternode2 kernel: kjournald starting.  Commit interval 5 seconds
Jun 13 14:29:05 clusternode2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:29:05 clusternode2 kernel: EXT3 FS on dm-8, internal journal
Jun 13 14:29:05 clusternode2 kernel: EXT3-fs: recovery complete.
Jun 13 14:29:05 clusternode2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:29:06 clusternode2 kernel: kjournald starting.  Commit interval 5 seconds
Jun 13 14:29:06 clusternode2 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Jun 13 14:29:06 clusternode2 kernel: EXT3 FS on dm-10, internal journal
Jun 13 14:29:06 clusternode2 kernel: EXT3-fs: recovery complete.
Jun 13 14:29:06 clusternode2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jun 13 14:29:06 clusternode2 avahi-daemon[9054]: Registering new address record for 172.16.45.203 on eth1.
Jun 13 14:29:28 clusternode2 SAPRLP_01[11001]: Unable to open trace file sapstartsrv.log. (Error 11 Resource temporarily unavailable) [ntservsserver.cpp 2218]
Jun 13 14:30:08 clusternode2 clurgmgrd[19449]: <notice> Service service:db started
Jun 13 14:49:05 clusternode2 syslogd 1.4.1: restart (remote reception).
Jun 13 14:49:05 clusternode2 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Jun 13 14:49:05 clusternode2 kernel: Linux version 2.6.18-164.el5 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Aug
18 15:51:48 EDT 2009
Jun 13 14:49:05 clusternode2 kernel: Command line: ro root=LABEL=/1 rhgb quiet
Jun 13 14:49:05 clusternode2 kernel: BIOS-provided physical RAM map:
Jun 13 14:49:05 clusternode2 kernel:  BIOS-e820: 0000000000010000 - 000000000009dc00 (usable)
Jun 13 14:49:05 clusternode2 kernel:  BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
Jun 13 14:49:05 clusternode2 kernel:  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)

论坛徽章:
0
5 [报告]
发表于 2011-06-13 16:25 |只看该作者
...........(省略硬件启动信息)

Jun 13 14:54:42 clusternode2 openais[8417]: [MAIN ] Service initialized 'openais CMAN membership service 2.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais extended virtual synchrony service'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais cluster membership service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais availability management framework B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais checkpoint service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais event service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais distributed locking service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais message service B.01.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais configuration service'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais cluster closed process group service v1.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SERV ] Service initialized 'openais cluster config database access v1.01'
Jun 13 14:54:42 clusternode2 openais[8417]: [SYNC ] Not using a virtual synchrony filter.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] entering GATHER state from 10.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] Saving state aru 0 high seq received 0
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] Storing new sequence id for ring 28
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] entering COMMIT state.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] entering RECOVERY state.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] position [0] member 10.0.0.203:
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] previous ring seq 36 rep 10.0.0.203
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] aru 39 high delivered 39 received flag 1
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] position [1] member 10.0.0.204:
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] previous ring seq 36 rep 10.0.0.204
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] aru 0 high delivered 0 received flag 1
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] Did not need to originate any messages in recovery.
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] New Configuration:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] Members Left:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] Members Joined:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] CLM CONFIGURATION CHANGE
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] New Configuration:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ]       r(0) ip(10.0.0.203)  
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ]       r(0) ip(10.0.0.204)  
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] Members Left:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] Members Joined:
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ]       r(0) ip(10.0.0.203)  
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ]       r(0) ip(10.0.0.204)  
Jun 13 14:54:42 clusternode2 openais[8417]: [SYNC ] This node is within the primary component and will provide service.
Jun 13 14:54:42 clusternode2 openais[8417]: [TOTEM] entering OPERATIONAL state.
Jun 13 14:54:42 clusternode2 openais[8417]: [CMAN ] quorum regained, resuming activity
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] got nodejoin message 10.0.0.203
Jun 13 14:54:42 clusternode2 openais[8417]: [CLM  ] got nodejoin message 10.0.0.204
Jun 13 14:54:42 clusternode2 openais[8417]: [CPG  ] got joinlist message from node 1
Jun 13 14:54:43 clusternode2 ccsd[8408]: Remote copy of cluster.conf is from quorate node.
Jun 13 14:54:43 clusternode2 ccsd[8408]:  Local version # : 213
Jun 13 14:54:43 clusternode2 ccsd[8408]:  Remote version #: 213
Jun 13 14:54:43 clusternode2 qdiskd[8177]: <info> Quorum Daemon Initializing
Jun 13 14:54:43 clusternode2 qdiskd[8177]: <crit> Initialization failed
Jun 13 14:54:43 clusternode2 ccsd[8408]: Initial status:: Quorate
Jun 13 14:54:57 clusternode2 kernel: dlm: Using TCP for communications
Jun 13 14:54:57 clusternode2 kernel: dlm: got connection from 1
Jun 13 14:54:58 clusternode2 clvmd: Cluster LVM daemon started - connected to CMAN
Jun 13 14:54:59 clusternode2 multipathd: dm-7: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-8: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-9: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-10: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-11: add map (uevent)
Jun 13 14:54:59 clusternode2 multipathd: dm-12: add map (uevent)
Jun 13 14:55:04 clusternode2 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "new_cluster:sapmnt"
Jun 13 14:55:04 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: Joined cluster. Now mounting FS...
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=1, already locked for use
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=1: Looking at journal...
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:sapmnt.1: jid=1: Done
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm", "new_cluster:ascs00"
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: Joined cluster. Now mounting FS...
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=1, already locked for use
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=1: Looking at journal...
Jun 13 14:55:05 clusternode2 kernel: GFS2: fsid=new_cluster:ascs00.1: jid=1: Done
Jun 13 14:56:26 clusternode2 clurgmgrd[8709]: <notice> Resource Group Manager Starting
Jun 13 14:57:27 clusternode2 clurgmgrd[8709]: <notice> Starting stopped service service:ascs
Jun 13 14:57:27 clusternode2 avahi-daemon[7782]: Registering new address record for 172.16.45.204 on eth1.
Jun 13 14:57:31 clusternode2 SAPRLP_00[11351]: SAP Service SAPRLP_00 successfully started.
Jun 13 14:58:11 clusternode2 kernel: process `sysctl' is using deprecated sysctl (syscall) net.ipv6.neigh.eth1.base_reachable_time; Use net.ipv6.neigh.eth1.base_reachable
_time_ms instead.
Jun 13 14:58:19 clusternode2 SAPRLP_01[11931]: SAP Service SAPRLP_01 successfully started.
Jun 13 14:59:00 clusternode2 clurgmgrd[8709]: <notice> Service service:ascs started

论坛徽章:
0
6 [报告]
发表于 2011-06-13 16:30 |只看该作者
有哪位高人给看看,一直没搞明白心跳为何会丢。心跳ip 10.0.0.203/204的网口已经做了bonding

论坛徽章:
0
7 [报告]
发表于 2011-06-27 10:41 |只看该作者
回复 1# liuyongsd


    能否看一下restat前面的日志,因为你粘出来都只是重启以后了,意义不大。还有想问一下,你问题解决了吗

论坛徽章:
0
8 [报告]
发表于 2011-09-02 09:54 |只看该作者
碰到同样的状况,每个6天在19:25左右会发生丢失令牌的现象,导致节点fence.  请问楼主找到解决方法了么.

论坛徽章:
3
丑牛
日期:2014-02-25 15:19:10金牛座
日期:2014-02-28 19:01:322015亚冠之西悉尼流浪者
日期:2015-06-10 15:01:09
9 [报告]
发表于 2011-09-02 13:20 |只看该作者
为什么不用heartbeat呢

论坛徽章:
0
10 [报告]
发表于 2011-09-07 09:44 |只看该作者
最为诡异的是cluster能正常运行一个固定的时间间隔(6天)后就重启.   而且每次重启的时间点也基本相同(都是在19:25左右).

先是主节点重启,启动完成后,主把备fence掉.然后cluster就恢复正常 .   系统版本同样是5.4 ,难道是新的bug么.
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP