yulemi 发表于 2013-06-30 16:40

cluster网络异常发生资源组切换,故障。

Sun Jun 30 11:03:35 CST 2013
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk status msg on node wgv8902 change to <LogicalHostname offline.>
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8902 change to R_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8902 change to R_POSTNET_STOPPING
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8902 change to R_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status on node wgv8902 change to R_FM_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status msg on node wgv8902 change to <>
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8902 change to RG_OFFLINE_START_FAILED
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8902 change to RG_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: Not attempting to start resource group <zbjk-rg> on node <wgv8901> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: Not attempting to start resource group <zbjk-rg> on node <wgv8902> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: rebalance: no primary node is currently found for resource group <zbjk-rg>.
Jun 23 10:43:51 wgv8901 in.mpathd: Successfully failed back to NIC ce3
Jun 23 10:43:51 wgv8901 in.mpathd: NIC repair detected on ce3 of group ipmp1
Jun 23 10:43:51 wgv8901 in.mpathd: At least 1 interface (ce3) of group ipmp1 has repaired
Jun 23 10:43:51 wgv8901 in.mpathd: Successfully failed over from NIC ce0 to NIC ce3
Jun 23 10:43:51 wgv8901 Cluster.PNM: ipmp1: state transition from DOWN to OK.
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8901 change to RG_PENDING_ONLINE
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_PRENET_STARTING
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: launching method <hafoip_prenet_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <300> seconds
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: resource zbjk status on node wgv8901 change to R_FM_UNKNOWN
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: resource zbjk status msg on node wgv8901 change to <Starting>
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: method <hafoip_prenet_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <300 seconds>
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_PRENET_STARTED
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_PRENET_STARTING
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: launching method <hastorageplus_prenet_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <1800> seconds
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status on node wgv8901 change to R_FM_UNKNOWN
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status msg on node wgv8901 change to <Starting>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: method <hastorageplus_prenet_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <1800 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_PRENET_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: launching method <hafoip_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <500> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk status on node wgv8901 change to R_FM_ONLINE
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk status msg on node wgv8901 change to <LogicalHostname online.>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: method <hafoip_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <500 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_JUST_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_ONLINE_UNMON
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_MON_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: launching method <hafoip_monitor_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <300> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: launching method <hastorageplus_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <90> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: method <hastorageplus_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <90 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_JUST_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_ONLINE_UNMON
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status on node wgv8901 change to R_FM_ONLINE
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status msg on node wgv8901 change to <>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_MON_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8901 change to RG_PENDING_ON_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: launching method <hastorageplus_monitor_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <90> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: method <hastorageplus_monitor_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <90 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_ONLINE
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: method <hafoip_monitor_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <300 seconds>
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_ONLINE
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8901 change to RG_ONLINE
Jun 30 10:43:17 wgv8901 in.mpathd: All Interfaces in group ipmp1 have failed


2次都是IPMP发生了切换导致cluster出现问题,如果在主节点或从节点手动切换做实验都可以切换,但是cluster不自动切换
如何才能解决这个问题?IPMP配置的没有问题。只连接了一个路由。难道是路由引起的问题?请教高手。

东方蜘蛛 发表于 2013-06-30 18:47

All Interfaces in group ipmp1 have failed

东方蜘蛛 发表于 2013-06-30 18:49

别用probo base方式配置ipmp,改成link base试试~

yulemi 发表于 2013-07-01 10:30

东方蜘蛛 发表于 2013-06-30 18:49 static/image/common/back.gif
别用probo base方式配置ipmp,改成link base试试~
谢谢蜘蛛大侠哈,这个是solaris9的系统,是不是可以在rc3.d这个目录下添加几个路由地址就能防止这样的问题?

东方蜘蛛 发表于 2013-07-01 12:22

如果网关不稳定,建议配置下:/etc/init.d/ipmp.targets
http://wenku.baidu.com/view/725083155f0e7cd184253641.html

yulemi 发表于 2013-07-01 15:08

本帖最后由 yulemi 于 2013-07-01 16:05 编辑

东方蜘蛛 发表于 2013-07-01 12:22 static/image/common/back.gif
如果网关不稳定,建议配置下:/etc/init.d/ipmp.targets
http://wenku.baidu.com/view/725083155f0e7cd184 ...

我的pong值是默认的3600.我想修改,但是提示报错,不知道是哪的原因
root@wgv8901 # scrgadm -c -g zbjk-rg -y Pingpong_interval=60
wgv8902 - Entry in /etc/vfstab for file system mount point /var/mqm is incorrect: File system mount point should specify 'mount at boot' as 'no'.

VALIDATE on resource zbjk-storageplus, resource group zbjk-rg, exited with non-zero exit status.
Validation of resource zbjk-storageplus in resource group zbjk-rg on node wgv8902 failed.
root@wgv8901 # scrgadm -pvv | grep pong
(zbjk-rg) Res Group Pingpong_interval:         3600
root@wgv8901 # more /etc/vfstab
#device devicemount   FS      fsck    mount   mount
#to   mount   to      fsck            point         type    pass    at boot options
#                     
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
#/dev/dsk/c1t0d0s1      -       -       swap    -       no      -
/dev/md/dsk/d20 -       -       swap    -       no      -
/dev/md/dsk/d10 /dev/md/rdsk/d10      /       ufs   1       no      -
#/dev/dsk/c1t0d0s4      /dev/rdsk/c1t0d0s4      /var    ufs   1       no      -
/dev/md/dsk/d40 /dev/md/rdsk/d40      /var    ufs   1       no      -
#/dev/dsk/c1t0d0s7      /dev/rdsk/c1t0d0s7      /globaldevicesufs   2       yes   -
#/dev/dsk/c1t0d0s3      /dev/rdsk/c1t0d0s3      /opt    ufs   2       yes   -
/dev/md/dsk/d30 /dev/md/rdsk/d30      /opt    ufs   2       yes   -
/dev/dsk/c1t3d0s0       /dev/rdsk/c1t3d0s0      /mqmtemp      ufs   1       yes   -
swap    -       /tmp    tmpfs   -       yes   -
#/dev/did/dsk/d11s7 /dev/did/rdsk/d11s7 /global/.devices/node@2 ufs 2 no global
/dev/md/dsk/d50 /dev/md/rdsk/d50      /global/.devices/node@2 ufs   2       no      global
#/dev/md/zbjkset2/dsk/d101      /dev/md/zbjkset2/rdsk/d101      /var/mqm      ufs   2       no      logging
#/dev/md/zbjkset2/dsk/d102      /dev/md/zbjkset2/rdsk/d102      /opt/mqm      ufs   2       no      logging
#/dev/md/zbjkset2/dsk/d103      /dev/md/zbjkset2/rdsk/d103      /u1/informix    ufs   2       no      logging
#/dev/md/zbjkset2/dsk/d104      /dev/md/zbjkset2/rdsk/d104      /u1/tuxedo      ufs   2       no      logging
#add sbjkset
/dev/md/sbjkset/dsk/d121      /dev/md/sbjkset/rdsk/d121      /opt/mqm      ufs   2       no      logging
/dev/md/sbjkset/dsk/d122      /dev/md/sbjkset/rdsk/d122      /u1/informix   ufs   2       no      logging
/dev/md/sbjkset/dsk/d123      /dev/md/sbjkset/rdsk/d123      /u1/tuxedo      ufs   2       no      logging
/dev/md/sbjkset/dsk/d124      /dev/md/sbjkset/rdsk/d124      /var/mqm      ufs   2       no      logging
root@wgv8901 #
另外一个节点的vfstab和主节点的vfstab是一样的,现在要如何才能把pong值改小?

东方蜘蛛 发表于 2013-07-01 15:29

# scrgadm -c -g xxxx -y Pingpong_interval=xxx
或者在WEB界面改

yulemi 发表于 2013-07-01 16:03

本帖最后由 yulemi 于 2013-07-01 16:04 编辑

东方蜘蛛 发表于 2013-07-01 15:29 static/image/common/back.gif
# scrgadm -c -g xxxx -y Pingpong_interval=xxx
或者在WEB界面改

我是这样改的,可是报错

东方蜘蛛 发表于 2013-07-01 16:06

yulemi 发表于 2013-07-01 16:03 static/image/common/back.gif
我是这样改的,可是报错

报错不是很明显了吗~~~:em06:

wgv8902 - Entry in /etc/vfstab for file system mount point /var/mqm is incorrect: File system mount point should specify 'mount at boot' as 'no'.

yulemi 发表于 2013-07-01 16:18

东方蜘蛛 发表于 2013-07-01 16:06 static/image/common/back.gif
报错不是很明显了吗~~~

wgv8902 - Entry in /etc/vfstab for file system mount point /v ...


我看了它是no的,不过之前是注释掉的,我现在把注释去掉了,还是一样报错。
页: [1] 2
查看完整版本: cluster网络异常发生资源组切换,故障。