- 论坛徽章:
- 0
|
本帖最后由 PinkOrient 于 2012-11-05 09:52 编辑
发现Rgmanager做restart的时候实际上是先stop再start脚本,跟预期的有点差异,为什么不直接调用脚本的restart参数呢?
设置如下- <service autostart="1" domain="xxx_dm" name="xxx_server" recovery="restart" max_restarts="3" restart_expire_time="60">
- <ip address="139.122.10.187" monitor_link="1">
- <script ref="xxx_server"/>
- </ip>
- </service>
复制代码 其中脚本xxx_server会监控n个xxx进程,如果任何一个xxx进程不存在了,则脚本status返回1,此时如果调用脚本的restart/start函数的话,其他n-1个正常的xxx进程不受影响,只是把停掉的拉起来。
尝试kill掉一个其中一个xxx_server进程,期望的是rgmanager会在本地主机调用一次service xxx_serverd restart, 直接把死掉的尝试拉起来,其他在跑的不影响,
但是实际情况如下,cluster发现status不为0后,重新把服务停掉并把资源withdraw,然后再重新register资源和拉起服务,把好的xxx进程也干掉了,并且整个过程的周期是18s左右。- Nov 2 17:03:52 ServerNode01 xxx_serverd[29499]: status ... [OK]
- Nov 2 17:04:25 ServerNode01 xxx_serverd[30222]: status ... [OK]
- Nov 2 17:04:58 ServerNode01 xxx_serverd[30842]: status ... [Failed] #发现死了一个,status不正常
- Nov 2 17:04:58 ServerNode01 clurgmgrd: [23683]: <err> script:xxx_server: status of /etc/init.d/xxx_serverd failed (returned 1)
- Nov 2 17:04:58 ServerNode01 clurgmgrd[23683]: <notice> status on script "xxx_server" returned 1 (generic error)
- Nov 2 17:04:58 ServerNode01 clurgmgrd[23683]: <notice> Stopping service service:xxx_server #停掉service,导致其他的几个也退出了
- Nov 2 17:04:58 ServerNode01 xxx_serverd[30985]: stop ... [OK]
- Nov 2 17:04:58 ServerNode01 avahi-daemon[6987]: Withdrawing address record for 139.122.10.187 on bond0. #VIP也withdraw掉了
- Nov 2 17:05:09 ServerNode01 clurgmgrd[23683]: <notice> Service service:xxx_server is recovering
- Nov 2 17:05:09 ServerNode01 clurgmgrd[23683]: <notice> Recovering failed service service:xxx_server
- Nov 2 17:05:11 ServerNode01 avahi-daemon[6987]: Registering new address record for 139.122.10.187 on bond0.
- Nov 2 17:05:16 ServerNode01 xxx_serverd[31550]: start ... [OK]
- Nov 2 17:05:16 ServerNode01 clurgmgrd[23683]: <notice> Service service:xxx_server started #重新分配资源和启动完成
- Nov 2 17:05:49 ServerNode01 xxx_serverd[32390]: status ... [OK]
复制代码 |
|