cluster网络异常发生资源组切换,故障。
Sun Jun 30 11:03:35 CST 2013Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk status msg on node wgv8902 change to <LogicalHostname offline.>
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8902 change to R_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8902 change to R_POSTNET_STOPPING
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8902 change to R_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status on node wgv8902 change to R_FM_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status msg on node wgv8902 change to <>
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8902 change to RG_OFFLINE_START_FAILED
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8902 change to RG_OFFLINE
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: Not attempting to start resource group <zbjk-rg> on node <wgv8901> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: Not attempting to start resource group <zbjk-rg> on node <wgv8902> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds
Jun 23 10:43:49 wgv8901 Cluster.RGM.rgmd: rebalance: no primary node is currently found for resource group <zbjk-rg>.
Jun 23 10:43:51 wgv8901 in.mpathd: Successfully failed back to NIC ce3
Jun 23 10:43:51 wgv8901 in.mpathd: NIC repair detected on ce3 of group ipmp1
Jun 23 10:43:51 wgv8901 in.mpathd: At least 1 interface (ce3) of group ipmp1 has repaired
Jun 23 10:43:51 wgv8901 in.mpathd: Successfully failed over from NIC ce0 to NIC ce3
Jun 23 10:43:51 wgv8901 Cluster.PNM: ipmp1: state transition from DOWN to OK.
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8901 change to RG_PENDING_ONLINE
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_PRENET_STARTING
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: launching method <hafoip_prenet_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <300> seconds
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: resource zbjk status on node wgv8901 change to R_FM_UNKNOWN
Jun 23 22:54:12 wgv8901 Cluster.RGM.rgmd: resource zbjk status msg on node wgv8901 change to <Starting>
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: method <hafoip_prenet_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <300 seconds>
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_PRENET_STARTED
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_PRENET_STARTING
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: launching method <hastorageplus_prenet_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <1800> seconds
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status on node wgv8901 change to R_FM_UNKNOWN
Jun 23 22:54:13 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status msg on node wgv8901 change to <Starting>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: method <hastorageplus_prenet_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <1800 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_PRENET_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: launching method <hafoip_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <500> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk status on node wgv8901 change to R_FM_ONLINE
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk status msg on node wgv8901 change to <LogicalHostname online.>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: method <hafoip_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <500 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_JUST_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_ONLINE_UNMON
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_MON_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: launching method <hafoip_monitor_start> for resource <zbjk>, resource group <zbjk-rg>, timeout <300> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: launching method <hastorageplus_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <90> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: method <hastorageplus_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <90 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_JUST_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_ONLINE_UNMON
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status on node wgv8901 change to R_FM_ONLINE
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus status msg on node wgv8901 change to <>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_MON_STARTING
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8901 change to RG_PENDING_ON_STARTED
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: launching method <hastorageplus_monitor_start> for resource <zbjk-storageplus>, resource group <zbjk-rg>, timeout <90> seconds
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: method <hastorageplus_monitor_start> completed successfully for resource <zbjk-storageplus>, resource group <zbjk-rg>, time used: 0% of timeout <90 seconds>
Jun 23 22:54:16 wgv8901 Cluster.RGM.rgmd: resource zbjk-storageplus state on node wgv8901 change to R_ONLINE
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: method <hafoip_monitor_start> completed successfully for resource <zbjk>, resource group <zbjk-rg>, time used: 0% of timeout <300 seconds>
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: resource zbjk state on node wgv8901 change to R_ONLINE
Jun 23 22:54:17 wgv8901 Cluster.RGM.rgmd: resource group zbjk-rg state on node wgv8901 change to RG_ONLINE
Jun 30 10:43:17 wgv8901 in.mpathd: All Interfaces in group ipmp1 have failed
2次都是IPMP发生了切换导致cluster出现问题,如果在主节点或从节点手动切换做实验都可以切换,但是cluster不自动切换
如何才能解决这个问题?IPMP配置的没有问题。只连接了一个路由。难道是路由引起的问题?请教高手。 All Interfaces in group ipmp1 have failed 别用probo base方式配置ipmp,改成link base试试~ 东方蜘蛛 发表于 2013-06-30 18:49 static/image/common/back.gif
别用probo base方式配置ipmp,改成link base试试~
谢谢蜘蛛大侠哈,这个是solaris9的系统,是不是可以在rc3.d这个目录下添加几个路由地址就能防止这样的问题? 如果网关不稳定,建议配置下:/etc/init.d/ipmp.targets
http://wenku.baidu.com/view/725083155f0e7cd184253641.html 本帖最后由 yulemi 于 2013-07-01 16:05 编辑
东方蜘蛛 发表于 2013-07-01 12:22 static/image/common/back.gif
如果网关不稳定,建议配置下:/etc/init.d/ipmp.targets
http://wenku.baidu.com/view/725083155f0e7cd184 ...
我的pong值是默认的3600.我想修改,但是提示报错,不知道是哪的原因
root@wgv8901 # scrgadm -c -g zbjk-rg -y Pingpong_interval=60
wgv8902 - Entry in /etc/vfstab for file system mount point /var/mqm is incorrect: File system mount point should specify 'mount at boot' as 'no'.
VALIDATE on resource zbjk-storageplus, resource group zbjk-rg, exited with non-zero exit status.
Validation of resource zbjk-storageplus in resource group zbjk-rg on node wgv8902 failed.
root@wgv8901 # scrgadm -pvv | grep pong
(zbjk-rg) Res Group Pingpong_interval: 3600
root@wgv8901 # more /etc/vfstab
#device devicemount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
#/dev/dsk/c1t0d0s1 - - swap - no -
/dev/md/dsk/d20 - - swap - no -
/dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no -
#/dev/dsk/c1t0d0s4 /dev/rdsk/c1t0d0s4 /var ufs 1 no -
/dev/md/dsk/d40 /dev/md/rdsk/d40 /var ufs 1 no -
#/dev/dsk/c1t0d0s7 /dev/rdsk/c1t0d0s7 /globaldevicesufs 2 yes -
#/dev/dsk/c1t0d0s3 /dev/rdsk/c1t0d0s3 /opt ufs 2 yes -
/dev/md/dsk/d30 /dev/md/rdsk/d30 /opt ufs 2 yes -
/dev/dsk/c1t3d0s0 /dev/rdsk/c1t3d0s0 /mqmtemp ufs 1 yes -
swap - /tmp tmpfs - yes -
#/dev/did/dsk/d11s7 /dev/did/rdsk/d11s7 /global/.devices/node@2 ufs 2 no global
/dev/md/dsk/d50 /dev/md/rdsk/d50 /global/.devices/node@2 ufs 2 no global
#/dev/md/zbjkset2/dsk/d101 /dev/md/zbjkset2/rdsk/d101 /var/mqm ufs 2 no logging
#/dev/md/zbjkset2/dsk/d102 /dev/md/zbjkset2/rdsk/d102 /opt/mqm ufs 2 no logging
#/dev/md/zbjkset2/dsk/d103 /dev/md/zbjkset2/rdsk/d103 /u1/informix ufs 2 no logging
#/dev/md/zbjkset2/dsk/d104 /dev/md/zbjkset2/rdsk/d104 /u1/tuxedo ufs 2 no logging
#add sbjkset
/dev/md/sbjkset/dsk/d121 /dev/md/sbjkset/rdsk/d121 /opt/mqm ufs 2 no logging
/dev/md/sbjkset/dsk/d122 /dev/md/sbjkset/rdsk/d122 /u1/informix ufs 2 no logging
/dev/md/sbjkset/dsk/d123 /dev/md/sbjkset/rdsk/d123 /u1/tuxedo ufs 2 no logging
/dev/md/sbjkset/dsk/d124 /dev/md/sbjkset/rdsk/d124 /var/mqm ufs 2 no logging
root@wgv8901 #
另外一个节点的vfstab和主节点的vfstab是一样的,现在要如何才能把pong值改小? # scrgadm -c -g xxxx -y Pingpong_interval=xxx
或者在WEB界面改 本帖最后由 yulemi 于 2013-07-01 16:04 编辑
东方蜘蛛 发表于 2013-07-01 15:29 static/image/common/back.gif
# scrgadm -c -g xxxx -y Pingpong_interval=xxx
或者在WEB界面改
我是这样改的,可是报错 yulemi 发表于 2013-07-01 16:03 static/image/common/back.gif
我是这样改的,可是报错
报错不是很明显了吗~~~:em06:
wgv8902 - Entry in /etc/vfstab for file system mount point /var/mqm is incorrect: File system mount point should specify 'mount at boot' as 'no'. 东方蜘蛛 发表于 2013-07-01 16:06 static/image/common/back.gif
报错不是很明显了吗~~~
wgv8902 - Entry in /etc/vfstab for file system mount point /v ...
我看了它是no的,不过之前是注释掉的,我现在把注释去掉了,还是一样报错。
页:
[1]
2