免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 6655 | 回复: 17
打印 上一主题 下一主题

【已解决】现场求救,HP MC/SG问题 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-03-25 12:25 |只看该作者 |倒序浏览
OS:HPUX 11.23   MC/SG:11.17

用以下步骤配置好双机后
vgchange -a y vglock
cmquerycl -n xiaojiA -n xiaojiB -v -C /etc/cmcluster/cmclconfig.asc
cmcheckconf -v -C cmclconfig.asc
cmapplyconf -v -C cmclconfig.asc
vgchange -a n vglock
启动双机
# cmruncl -v
cmrunnode: Validating network configuration...
Gathering network information
Beginning network probing (this may take a while)
Completed network probing
cmrunnode: Network validation complete
Waiting for cluster to form .............. timed out
Check the syslog files for information.
cmrunnode failed: timed out waiting for cluster to form

以下是syslog -------------------------------------------------------------
Mar 25 12:00:45 xiaojiB CM-CMD[6221]: cmruncl -v
Mar 25 12:00:45 xiaojiB cmclconfd[6223]: Request from root on node xiaojiB to start the cluster on this node
Mar 25 12:00:45 xiaojiB cmcld[6229]: Logging level changed to level 0.
Mar 25 12:00:45 xiaojiB cmcld[6229]: Daemon Initialization - Maximum number of packages supported for this incarnation is 150.
Mar 25 12:00:45 xiaojiB cmcld[6229]: Global Cluster Information:
Mar 25 12:00:45 xiaojiB cmcld[6229]: Heartbeat Interval is 1.00 seconds.
Mar 25 12:00:45 xiaojiB cmcld[6229]: Logging level changed to level 0.
Mar 25 12:00:45 xiaojiB cmcld[6229]: Node Timeout is 10.00 seconds.
Mar 25 12:00:45 xiaojiB cmcld[6229]: Network Polling Interval is 2.00 seconds.
Mar 25 12:00:45 xiaojiB cmcld[6229]: Auto Start Timeout is 600.00 seconds.
Mar 25 12:00:45 xiaojiB cmcld[6229]: Failover Optimization is disabled.
Mar 25 12:00:45 xiaojiB cmcld[6229]: Information Specific to node xiaojiB:
Mar 25 12:00:45 xiaojiB cmcld[6229]: Cluster lock disk: /dev/dsk/c4t0d0.
Mar 25 12:00:45 xiaojiB cmcld[6229]: lan1  0x001a4b07cade  10.88.5.11  bridged net:1
Mar 25 12:00:45 xiaojiB cmcld[6229]: lan2  0x001a4b07cadf  192.168.0.11  bridged net:1
Mar 25 12:00:45 xiaojiB cmcld[6229]: Heartbeat Subnet: 10.88.5.0
Mar 25 12:00:45 xiaojiB cmcld[6229]: Heartbeat Subnet: 192.168.0.0
Mar 25 12:00:45 xiaojiB cmcld[6229]: The maximum # of concurrent local connections to the daemon that will be supported is 4066.
Mar 25 12:00:45 xiaojiB cmlvmd[6235]: lvm online query ioctl success- supports online feature
Mar 25 12:00:45 xiaojiB cmcld[6229]: Waiting for connection request from CMGMSD
Mar 25 12:00:45 xiaojiB cmcld[6229]: CMGMSD (pid=6237) successfully started
Mar 25 12:00:45 xiaojiB cmcld[6229]: rcomm health:  Initializing timeout to 120000000 microseconds
Mar 25 12:00:46 xiaojiB cmcld[6229]: Total allocated: 35108680 bytes, used: 2005040 bytes, unused 33103632 bytes
Mar 25 12:00:46 xiaojiB cmcld[6229]: Starting cluster management protocols.
Mar 25 12:00:46 xiaojiB cmcld[6229]: Attempting to form a new cluster
Mar 25 12:00:46 xiaojiB cmcld[6229]: Beginning standard election
Mar 25 12:01:46 xiaojiB cmcld[6229]: Cluster formation failed
Mar 25 12:01:46 xiaojiB cmcld[6229]: Reason: Ran out of time for manually starting the cluster
Mar 25 12:01:43 xiaojiB cmcld[6229]: Attempting to form a new cluster
Mar 25 12:01:46 xiaojiB  above message repeats 5 times
Mar 25 12:01:46 xiaojiB cmsrvassistd[6232]: The cluster daemon aborted our connection (231).
Mar 25 12:01:43 xiaojiB cmcld[6229]: Beginning standard election
Mar 25 12:01:46 xiaojiB  above message repeats 5 times
Mar 25 12:01:46 xiaojiB cmsrvassistd[6232]: Lost connection with Serviceguard cluster daemon (cmcld): Software caused connection abort
Mar 25 12:01:46 xiaojiB cmnetassistd[6234]: The cluster daemon aborted our connection (231).
Mar 25 12:01:46 xiaojiB cmnetassistd[6234]: Lost connection with Serviceguard cluster daemon (cmcld): Software caused connection abort
Mar 25 12:01:46 xiaojiB cmlvmd[6235]: The cluster daemon aborted our connection (231).
Mar 25 12:01:46 xiaojiB cmlvmd[6235]: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Mar 25 12:01:46 xiaojiB cmlvmd[6235]: CLVMD exiting
Mar 25 12:01:46 xiaojiB cmlvmd[6235]: Could not read messages from /usr/lbin/cmcld: Software caused connection abort


如果我用cmrunnode -v启动一个结点
还是报一样的错,不过用cmviewcl -v看cluster状态时显示节点时starting reforming
root@xiaojiB:/etc/cmcluster#cmrunnode -v
cmrunnode: Validating network configuration...
Gathering network information
Beginning network probing (this may take a while)
Completed network probing
cmrunnode: Network validation complete
Waiting for cluster to form .............. timed out
Check the syslog files for information.
cmrunnode failed: timed out waiting for cluster to form

cmrunnode -v启动时cmviewcl -v 的结果CLUSTER        STATUS      
cluster10g     starting     
  
  NODE           STATUS       STATE        
  xiaojiA        unknown      unknown      
   
    Cluster_Lock_LVM:
    VOLUME_GROUP          PHYSICAL_VOLUME       STATUS              
    /dev/vglock           /dev/dsk/c4t0d0       unknown            
   
    Network_Parameters:
    INTERFACE    STATUS       PATH                NAME         
    PRIMARY      unknown      0/4/2/0             lan1         
    PRIMARY      unknown      0/4/2/1             lan2         
  
  NODE           STATUS       STATE        
  xiaojiB        starting     reforming   
   
    Cluster_Lock_LVM:
    VOLUME_GROUP          PHYSICAL_VOLUME       STATUS              
    /dev/vglock           /dev/dsk/c4t0d0       unknown            
   
    Network_Parameters:
    INTERFACE    STATUS       PATH                NAME         
    PRIMARY      up           0/4/2/0             lan1         
    PRIMARY      up           0/4/2/1             lan2  

cmrunnode -v 时的syslog
Mar 25 12:11:07 xiaojiB CM-CMD[6554]: cmrunnode -v
Mar 25 12:11:07 xiaojiB cmclconfd[6556]: Request from root on node xiaojiB to start the cluster on this node
Mar 25 12:11:07 xiaojiB cmcld[6562]: Logging level changed to level 0.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Daemon Initialization - Maximum number of packages supported for this incarnation is 150.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Global Cluster Information:
Mar 25 12:11:07 xiaojiB cmcld[6562]: Heartbeat Interval is 1.00 seconds.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Logging level changed to level 0.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Node Timeout is 10.00 seconds.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Network Polling Interval is 2.00 seconds.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Auto Start Timeout is 600.00 seconds.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Failover Optimization is disabled.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Information Specific to node xiaojiB:
Mar 25 12:11:07 xiaojiB cmcld[6562]: Cluster lock disk: /dev/dsk/c4t0d0.
Mar 25 12:11:07 xiaojiB cmcld[6562]: lan1  0x001a4b07cade  10.88.5.11  bridged net:1
Mar 25 12:11:07 xiaojiB cmcld[6562]: lan2  0x001a4b07cadf  192.168.0.11  bridged net:1
Mar 25 12:11:07 xiaojiB cmcld[6562]: Heartbeat Subnet: 10.88.5.0
Mar 25 12:11:07 xiaojiB cmcld[6562]: Heartbeat Subnet: 192.168.0.0
Mar 25 12:11:07 xiaojiB cmcld[6562]: The maximum # of concurrent local connections to the daemon that will be supported is 4066.
Mar 25 12:11:07 xiaojiB cmlvmd[6568]: lvm online query ioctl success- supports online feature
Mar 25 12:11:07 xiaojiB cmcld[6562]: Waiting for connection request from CMGMSD
Mar 25 12:11:07 xiaojiB cmcld[6562]: CMGMSD (pid=6569) successfully started
Mar 25 12:11:07 xiaojiB cmcld[6562]: rcomm health:  Initializing timeout to 120000000 microseconds
Mar 25 12:11:07 xiaojiB cmcld[6562]: Total allocated: 35108680 bytes, used: 2002832 bytes, unused 33105840 bytes
Mar 25 12:11:07 xiaojiB cmcld[6562]: Starting cluster management protocols.
Mar 25 12:11:07 xiaojiB cmcld[6562]: Attempting to form a new cluster
Mar 25 12:11:07 xiaojiB cmcld[6562]: Beginning standard election
Mar 25 12:21:07 xiaojiB cmcld[6562]: Cluster formation failed
Mar 25 12:21:07 xiaojiB cmcld[6562]: Reason: Ran out of time for automatically joining a cluster
Mar 25 12:20:56 xiaojiB cmcld[6562]: Attempting to form a new cluster
Mar 25 12:21:07 xiaojiB  above message repeats 51 times
Mar 25 12:21:07 xiaojiB cmcld[6562]: Unable to contact all nodes in the cluster, thus it is not
Mar 25 12:20:56 xiaojiB cmcld[6562]: Beginning standard election
Mar 25 12:21:07 xiaojiB  above message repeats 51 times
Mar 25 12:21:07 xiaojiB cmcld[6562]:   possible to join the cluster at this time.
Mar 25 12:21:07 xiaojiB cmsrvassistd[6565]: The cluster daemon aborted our connection (231).
Mar 25 12:21:07 xiaojiB cmsrvassistd[6565]: Lost connection with Serviceguard cluster daemon (cmcld): Software caused connection abort
Mar 25 12:21:07 xiaojiB cmnetassistd[6567]: The cluster daemon aborted our connection (231).
Mar 25 12:21:07 xiaojiB cmnetassistd[6567]: Lost connection with Serviceguard cluster daemon (cmcld): Software caused connection abort
Mar 25 12:21:07 xiaojiB cmlvmd[6568]: The cluster daemon aborted our connection (231).
Mar 25 12:21:07 xiaojiB cmlvmd[6568]: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Mar 25 12:21:07 xiaojiB cmlvmd[6568]: CLVMD exiting
Mar 25 12:21:07 xiaojiB cmcld[6562]: If the cluster is not running, use the cmruncl command to
Mar 25 12:21:07 xiaojiB cmcld[6562]:   start it. If the cluster is running on other nodes, verify
Mar 25 12:21:07 xiaojiB cmlvmd[6568]: Could not read messages from /usr/lbin/cmcld: Software caused connection abort
Mar 25 12:21:07 xiaojiB cmcld[6562]:   this node's ability to send messages to the other nodes,
Mar 25 12:21:07 xiaojiB cmcld[6562]:   then re-issue the cmrunnode command

[ 本帖最后由 joebora 于 2008-3-25 18:09 编辑 ]

论坛徽章:
0
2 [报告]
发表于 2008-03-25 13:18 |只看该作者
检查过你的脚本吗?

论坛徽章:
0
3 [报告]
发表于 2008-03-25 13:50 |只看该作者
你说配置脚本? cmquerycl 的? 这个没问题的

我用cmruncl -v -n xiaojiB可以启动一个节点
再到A上执行cmruncl -v -n xiaojiA后 B上的就down 了

论坛徽章:
0
4 [报告]
发表于 2008-03-25 13:51 |只看该作者
看syslog每次都在cmcld[6562]: Beginning standard election 这里停好久

/etc/hosts:
127.0.0.1       localhost       loopback
10.88.5.10      xiaojiA
10.88.5.11      xiaojiB
192.168.0.10    priv1 xiaojiA
192.168.0.11    priv2 xiaojiB
10.88.5.12      vip1
10.88.5.13      vip2

.rhosts
xiaojiA root
xiaojiB root

小机之间互相ping rlogin remsh都没有问题

[ 本帖最后由 joebora 于 2008-3-25 13:52 编辑 ]

论坛徽章:
0
5 [报告]
发表于 2008-03-25 13:56 |只看该作者
cmgetconf

CLUSTER_NAME            cluster10

FIRST_CLUSTER_LOCK_VG           /dev/vgora

NODE_NAME               xiaojiA
  NETWORK_INTERFACE     lan1
    HEARTBEAT_IP        10.88.5.10
  NETWORK_INTERFACE     lan2
    HEARTBEAT_IP        192.168.0.10
  FIRST_CLUSTER_LOCK_PV /dev/dsk/c4t0d1

NODE_NAME               xiaojiB
  NETWORK_INTERFACE     lan1
    HEARTBEAT_IP        10.88.5.11
  NETWORK_INTERFACE     lan2
    HEARTBEAT_IP        192.168.0.11
  FIRST_CLUSTER_LOCK_PV /dev/dsk/c4t0d1

HEARTBEAT_INTERVAL              1000000
NODE_TIMEOUT            10000000

AUTO_START_TIMEOUT      600000000
NETWORK_POLLING_INTERVAL        2000000

NETWORK_FAILURE_DETECTION               INOUT

MAX_CONFIGURED_PACKAGES         10

OPS_VOLUME_GROUP                /dev/vgora

论坛徽章:
0
6 [报告]
发表于 2008-03-25 14:04 |只看该作者
1,将vip屏蔽试试!


2,
因为你cmhaltcl 的时候,cluster没有正常停止,ps -ef|grep cm
可以看到的。

所以,你再启动的时候,总是在reforming呢,
你先将cluster停止,然后将2个节点重启下,再执行cmruncl

论坛徽章:
0
7 [报告]
发表于 2008-03-25 14:13 |只看该作者
czyf2001  你好

屏蔽VIP地址是把地址从etc/hosts里删掉吗

我这里是新安装的,cmruncl就没起来过,我说的reforming是cmrunnode -v后 用cmviewcl -v 去看会看到节点reforming

多谢

我刚才翻了以前的旧帖子 看到有人和我情况差不多,他说他在/etc/inetd.cond 最后一行加了 -i就好了 刚才试了不行

论坛徽章:
0
8 [报告]
发表于 2008-03-25 14:14 |只看该作者
还有当节点是reforming的时候根本没法停止cluster 提示cluster not active

论坛徽章:
0
9 [报告]
发表于 2008-03-25 14:33 |只看该作者
我用交叉网线把 192.168.0.0那个网段的口直连  一下就起来了

论坛徽章:
0
10 [报告]
发表于 2008-03-25 14:38 |只看该作者
1,确保心跳能通;
2,将/etc/hosts vip注释;
3,直接将主机重启;

然后重新check、apply
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP