免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: yulemi
打印 上一主题 下一主题

cluster网络异常发生资源组切换,故障。 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2013-06-30 16:40 |显示全部楼层 |倒序浏览
Sun Jun 30 11:03:35 CST 2013
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource zbjk status msg on node wgv8902 change to <LogicalHostname offline.>
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk state on node wgv8902 change to R_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8902 change to R_POSTNET_STOPPING
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8902 change to R_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource zbjk-storageplus status on node wgv8902 change to R_FM_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource zbjk-storageplus status msg on node wgv8902 change to <>
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group zbjk-rg state on node wgv8902 change to RG_OFFLINE_START_FAILED
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group zbjk-rg state on node wgv8902 change to RG_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 447451 daemon.error] Not attempting to start resource group <zbjk-rg> on node <wgv8901> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 447451 daemon.error] Not attempting to start resource group <zbjk-rg> on node <wgv8902> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: [ID 674214 daemon.notice] rebalance: no primary node is currently found for resource group <zbjk-rg>.
Jun 23 10:43:51 wgv8901 in.mpathd[2332]: [ID 620804 daemon.error] Successfully failed back to NIC ce3
Jun 23 10:43:51 wgv8901 in.mpathd[2332]: [ID 299542 daemon.error] NIC repair detected on ce3 of group ipmp1
Jun 23 10:43:51 wgv8901 in.mpathd[2332]: [ID 237757 daemon.error] At least 1 interface (ce3) of group ipmp1 has repaired
Jun 23 10:43:51 wgv8901 in.mpathd[2332]: [ID 832587 daemon.error] Successfully failed over from NIC ce0 to NIC ce3
Jun 23 10:43:51 wgv8901 Cluster.PNM: [ID 890413 daemon.notice] ipmp1: state transition from DOWN to OK.
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group zbjk-rg state on node wgv8901 change to RG_PENDING_ONLINE
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk state on node wgv8901 change to R_PRENET_STARTING
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_prenet_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <300> seconds
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource zbjk status on node wgv8901 change to R_FM_UNKNOWN
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource zbjk status msg on node wgv8901 change to <Starting>
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_prenet_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <300 seconds>
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk state on node wgv8901 change to R_PRENET_STARTED
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8901 change to R_PRENET_STARTING
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hastorageplus_prenet_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <1800> seconds
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource zbjk-storageplus status on node wgv8901 change to R_FM_UNKNOWN
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource zbjk-storageplus status msg on node wgv8901 change to <Starting>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hastorageplus_prenet_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <1800 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8901 change to R_PRENET_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk state on node wgv8901 change to R_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <500> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource zbjk status on node wgv8901 change to R_FM_ONLINE
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource zbjk status msg on node wgv8901 change to <LogicalHostname online.>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <500 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk state on node wgv8901 change to R_JUST_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk state on node wgv8901 change to R_ONLINE_UNMON
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8901 change to R_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk state on node wgv8901 change to R_MON_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hafoip_monitor_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <300> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hastorageplus_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <90> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hastorageplus_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <90 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8901 change to R_JUST_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8901 change to R_ONLINE_UNMON
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource zbjk-storageplus status on node wgv8901 change to R_FM_ONLINE
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 922363 daemon.notice] resource zbjk-storageplus status msg on node wgv8901 change to <>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8901 change to R_MON_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group zbjk-rg state on node wgv8901 change to RG_PENDING_ON_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 707948 daemon.notice] launching method <hastorageplus_monitor_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <90> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hastorageplus_monitor_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <90 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk-storageplus state on node wgv8901 change to R_ONLINE
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: [ID 736390 daemon.notice] method <hafoip_monitor_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <300 seconds>
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: [ID 443746 daemon.notice] resource zbjk state on node wgv8901 change to R_ONLINE
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: [ID 529407 daemon.notice] resource group zbjk-rg state on node wgv8901 change to RG_ONLINE
Jun 30 10:43:17 wgv8901 in.mpathd[2332]: [ID 168056 daemon.error] All Interfaces in group ipmp1 have failed


2次都是IPMP发生了切换导致cluster出现问题,如果在主节点或从节点手动切换做实验都可以切换,但是cluster不自动切换
如何才能解决这个问题?IPMP配置的没有问题。只连接了一个路由。难道是路由引起的问题?请教高手。

论坛徽章:
0
2 [报告]
发表于 2013-07-01 10:30 |显示全部楼层
东方蜘蛛 发表于 2013-06-30 18:49
别用probo base方式配置ipmp,改成link base试试~

谢谢蜘蛛大侠哈,这个是solaris9的系统,是不是可以在rc3.d这个目录下添加几个路由地址就能防止这样的问题?

论坛徽章:
0
3 [报告]
发表于 2013-07-01 15:08 |显示全部楼层
本帖最后由 yulemi 于 2013-07-01 16:05 编辑
东方蜘蛛 发表于 2013-07-01 12:22
如果网关不稳定,建议配置下:/etc/init.d/ipmp.targets
http://wenku.baidu.com/view/725083155f0e7cd184 ...


我的pong值是默认的3600.我想修改,但是提示报错,不知道是哪的原因
root@wgv8901 # scrgadm -c -g zbjk-rg -y Pingpong_interval=60
wgv8902 - Entry in /etc/vfstab for file system mount point /var/mqm is incorrect: File system mount point should specify 'mount at boot' as 'no'.

VALIDATE on resource zbjk-storageplus, resource group zbjk-rg, exited with non-zero exit status.
Validation of resource zbjk-storageplus in resource group zbjk-rg on node wgv8902 failed.
root@wgv8901 # scrgadm -pvv | grep pong
  (zbjk-rg) Res Group Pingpong_interval:           3600
root@wgv8901 # more /etc/vfstab
#device device  mount   FS      fsck    mount   mount
#to     mount   to      fsck            point           type    pass    at boot options
#                       
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
#/dev/dsk/c1t0d0s1      -       -       swap    -       no      -
/dev/md/dsk/d20 -       -       swap    -       no      -
/dev/md/dsk/d10 /dev/md/rdsk/d10        /       ufs     1       no      -
#/dev/dsk/c1t0d0s4      /dev/rdsk/c1t0d0s4      /var    ufs     1       no      -
/dev/md/dsk/d40 /dev/md/rdsk/d40        /var    ufs     1       no      -
#/dev/dsk/c1t0d0s7      /dev/rdsk/c1t0d0s7      /globaldevices  ufs     2       yes     -
#/dev/dsk/c1t0d0s3      /dev/rdsk/c1t0d0s3      /opt    ufs     2       yes     -
/dev/md/dsk/d30 /dev/md/rdsk/d30        /opt    ufs     2       yes     -
/dev/dsk/c1t3d0s0       /dev/rdsk/c1t3d0s0      /mqmtemp        ufs     1       yes     -
swap    -       /tmp    tmpfs   -       yes     -
#/dev/did/dsk/d11s7 /dev/did/rdsk/d11s7 /global/.devices/node@2 ufs 2 no global
/dev/md/dsk/d50 /dev/md/rdsk/d50        /global/.devices/node@2 ufs     2       no      global
#/dev/md/zbjkset2/dsk/d101      /dev/md/zbjkset2/rdsk/d101      /var/mqm        ufs     2       no      logging
#/dev/md/zbjkset2/dsk/d102      /dev/md/zbjkset2/rdsk/d102      /opt/mqm        ufs     2       no      logging
#/dev/md/zbjkset2/dsk/d103      /dev/md/zbjkset2/rdsk/d103      /u1/informix    ufs     2       no      logging
#/dev/md/zbjkset2/dsk/d104      /dev/md/zbjkset2/rdsk/d104      /u1/tuxedo      ufs     2       no      logging
#add sbjkset
/dev/md/sbjkset/dsk/d121        /dev/md/sbjkset/rdsk/d121      /opt/mqm        ufs     2       no      logging
/dev/md/sbjkset/dsk/d122        /dev/md/sbjkset/rdsk/d122      /u1/informix     ufs     2       no      logging
/dev/md/sbjkset/dsk/d123        /dev/md/sbjkset/rdsk/d123      /u1/tuxedo      ufs     2       no      logging
/dev/md/sbjkset/dsk/d124        /dev/md/sbjkset/rdsk/d124      /var/mqm        ufs     2       no      logging
root@wgv8901 #
另外一个节点的vfstab和主节点的vfstab是一样的,现在要如何才能把pong值改小?

论坛徽章:
0
4 [报告]
发表于 2013-07-01 16:03 |显示全部楼层
本帖最后由 yulemi 于 2013-07-01 16:04 编辑
东方蜘蛛 发表于 2013-07-01 15:29
# scrgadm -c -g xxxx -y Pingpong_interval=xxx
或者在WEB界面改


我是这样改的,可是报错

论坛徽章:
0
5 [报告]
发表于 2013-07-01 16:18 |显示全部楼层
东方蜘蛛 发表于 2013-07-01 16:06
报错不是很明显了吗~~~

wgv8902 - Entry in /etc/vfstab for file system mount point /v ...



我看了它是no的,不过之前是注释掉的,我现在把注释去掉了,还是一样报错。

论坛徽章:
0
6 [报告]
发表于 2013-07-01 16:23 |显示全部楼层
东方蜘蛛 发表于 2013-07-01 16:22
两个节点的vfstab要一样!

是一样的,我仔细核对过

论坛徽章:
0
7 [报告]
发表于 2013-07-01 16:27 |显示全部楼层
yulemi 发表于 2013-07-01 16:23
是一样的,我仔细核对过

v8901的vfstab
/dev/md/sbjkset/dsk/d121        /dev/md/sbjkset/rdsk/d121      /opt/mqm        ufs     2       no      logging
/dev/md/sbjkset/dsk/d122        /dev/md/sbjkset/rdsk/d122      /u1/informix     ufs     2       no      logging
/dev/md/sbjkset/dsk/d123        /dev/md/sbjkset/rdsk/d123      /u1/tuxedo      ufs     2       no      logging
/dev/md/sbjkset/dsk/d124        /dev/md/sbjkset/rdsk/d124      /var/mqm        ufs     2       no      logging

V8902的vfstab
/dev/md/sbjkset/dsk/d121        /dev/md/sbjkset/rdsk/d121      /opt/mqm        ufs     2       no      logging
/dev/md/sbjkset/dsk/d122        /dev/md/sbjkset/rdsk/d122      /u1/informix     ufs     2       no      logging
/dev/md/sbjkset/dsk/d123        /dev/md/sbjkset/rdsk/d123      /u1/tuxeduo      ufs     2       no      logging
/dev/md/sbjkset/dsk/d124        /dev/md/sbjkset/rdsk/d124      /var/mqm        ufs     2       no      logging

论坛徽章:
0
8 [报告]
发表于 2013-07-01 17:40 |显示全部楼层
东方蜘蛛 发表于 2013-07-01 16:42
我觉得你应该先解决IPMP问题~

OK好的,谢谢蜘蛛大侠的解答呀。有啥问题我再继续更新这个帖子
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP