免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2266 | 回复: 2
打印 上一主题 下一主题

service经常restart,找不到原因。 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2009-06-18 16:22 |只看该作者 |倒序浏览
RHEL AS3里面,两个节点组成cluster,节点二上的service一直以来都无端restart,不知道什么原因,syslog里面的记录:
Jun 16 04:12:31   conn02 clusvcmgrd[32611]: <crit> Couldn't connect to member #0: Broken pipe
Jun 16 04:12:31  conn02 clusvcmgrd[32611]: <err> Unable to obtain cluster lock: No locks available
Jun 16 04:12:31  conn02 clusvcmgrd[32611]: <warning> Restarting locally failed service app
Jun 16 04:12:37  conn02 clusvcmgrd[32611]: <crit> Couldn't connect to member #0: Broken pipe
Jun 16 04:12:37  conn02 clusvcmgrd[32611]: <err> Unable to obtain cluster lock: No locks available
Jun 16 06:21:36  conn02 clusvcmgrd[29386]: <err> Unable to obtain cluster lock: Broken pipe
Jun 16 06:21:36  conn02 clusvcmgrd[29386]: <warning> Restarting locally failed service app
Jun 16 06:21:36  conn02 clusvcmgrd: [29387]: <notice> service notice: Stopping service app ...
Jun 16 06:21:36  conn02 clusvcmgrd: [29387]: <notice> service notice: Running user script '/scripts/app.sh stop'
Jun 16 06:22:09  conn02 clusvcmgrd: [29387]: <notice> service notice: Stopped service app ...
Jun 16 06:22:09  conn02 clusvcmgrd[29386]: <notice> Starting stopped service app
Jun 16 06:22:09  conn02 clusvcmgrd: [5644]: <notice> service notice: Starting service app ...
Jun 16 06:22:10  conn02 clusvcmgrd: [5644]: <notice> service notice: Running user script '/scripts/app.sh start'
Jun 16 06:22:10  conn02 clusvcmgrd: [5644]: <notice> service notice: Started service app ...
cluster的几日也记录了service的状态,通常都是把service stop,很快之后又把service startup。
节点一不会有这样的情况发生。
cluster.xml见下。
<?xml version="1.0"?>
<cluconfig version="3.0">
  <clumembd broadcast="no" interval="1000000" loglevel="5" multicast="yes" multicast_ipaddress="225.0.0.11" thread="yes" tko_count="20"/>
  <cluquorumd loglevel="5" pinginterval="2" tiebreaker_ip=""/>
  <clurmtabd loglevel="5" pollinterval="4"/>
  <clusvcmgrd loglevel="5"/>
  <clulockd loglevel="5"/>
  <cluster config_viewnumber="74" key="152564bfc7a0b536a6a6d7ba0c5d00bc" name="foxsz"/>
  <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1" rawshadow="/dev/raw/raw2" type="raw"/>
  <members>
    <member id="0" name="conn02" watchdog="yes">
    </member>
    <member id="1" name="conn01" watchdog="yes">
    </member>
  </members>
  <services>
    <service checkinterval="10" failoverdomain="db" id="0" name="db" userscript="/scripts/db.sh">
      <service_ipaddresses>
        <service_ipaddress broadcast="10.207.195.47" id="0" ipaddress="10.207.195.37" netmask="255.255.255.240"/>
      </service_ipaddresses>
      <device id="0" name="/dev/oravg/lvol1" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/product" options="noatime"/>
      </device>
      <device id="1" name="/dev/oravg/lvol2" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/index2" options="noatime"/>
      </device>
      <device id="2" name="/dev/oravg/lvol3" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/index1" options="noatime"/>
      </device>
      <device id="3" name="/dev/oravg/lvol4" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/backup" options="noatime"/>
      </device>
      <device id="4" name="/dev/oravg/lvol5" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/data2" options="noatime"/>
      </device>
      <device id="5" name="/dev/oravg/lvol6" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/data1" options="noatime"/>
      </device>
      <device id="6" name="/dev/oravg/lvol7" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/coredbf" options="noatime"/>
      </device>
      <device id="7" name="/dev/oravg/lvol8" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/archive" options="noatime"/>
      </device>
      <device id="8" name="/dev/oravg2/lvol1" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/redo_log1" options="noatime"/>
      </device>
      <device id="9" name="/dev/oravg2/lvol2" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/oracle/redo_log2" options="noatime"/>
      </device>
    </service>
    <service checkinterval="10" failoverdomain="app" id="1" name="app" userscript="/scripts/app.sh">
      <service_ipaddresses>
        <service_ipaddress broadcast="10.207.195.47" id="0" ipaddress="10.207.195.40" netmask="255.255.255.240"/>
      </service_ipaddresses>
      <device id="0" name="/dev/appvg/lvol1" sharename="users">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/users" options="noatime"/>
        <nfsexport id="0" name="/users">
          <client id="0" name="*" options="rw"/>
        </nfsexport>
      </device>
      <device id="1" name="/dev/appvg/lvol2" sharename="users1">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/app/appserv" options="noatime"/>
      </device>
    </service>
  </services>
  <failoverdomains>
    <failoverdomain id="0" name="db" ordered="yes" restricted="yes">
      <failoverdomainnode id="0" name="conn01"/>
      <failoverdomainnode id="1" name="conn02"/>
    </failoverdomain>
    <failoverdomain id="1" name="app" ordered="yes" restricted="yes">
      <failoverdomainnode id="0" name="conn02"/>
      <failoverdomainnode id="1" name="conn01"/>
    </failoverdomain>
  </failoverdomains>
</cluconfig>


这个问题会不会跟网络有关系,这套系统的心跳和数据网线都连接在同一个switch上。我也不知道这种情况该怎么定义,我觉得不是failover,但为什么cluster会把service stop掉呢?

论坛徽章:
0
2 [报告]
发表于 2009-06-29 10:44 |只看该作者
水晶头的问题吧
我单位以前也这样

论坛徽章:
0
3 [报告]
发表于 2009-06-29 11:22 |只看该作者

回复 #2 00306 的帖子

配置文件是没有多少出入的,就只有这个需要检测了
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP