免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 5888 | 回复: 0
打印 上一主题 下一主题

[HACMP集群] ZT》 HACMP - When 1 node is down [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2008-08-22 23:13 |只看该作者 |倒序浏览
英文,希望有所帮助。


I encountered this problem today and thought it would be important to share the situation and fix with you.

I've got 1 node at HACMP software level 5.4.1.2 and the other is at 5.4.1.3 (both levels have the problem - but 5.4.1.1 doesn't)

In a 2 node cluster - when 1 node is down (either clcomdES or AIX is shutdown) and the resource groups are OFFLINE - you will try to bring a resource group online through smit.
This will not work.

You will see this error:

┌──────────────────────────────────────────────────────────────────────────┐
│ Select a Resource Group │
│ │
│ Move cursor to desired item and press Enter. │
│ │
│ migcheck[471]: cl_connect() error, nodename=coenim, rc=-1 │
│ hacmprg OFFLINE │
│ │

Regardless of which line you select the next error will be:

1800-018 There are currently no additional
SMIT selector screen entries available for this
item. This item may require installation of
additional software before it can be accessed.

smit will exit and the resource group will still be in the OFFLINE state.

This is not good.

The smit.log contains:
(Specified sm_name_hdr with
id = "cl_resgrp_start.select_node_migcheck[471]",
was not found in SMIT/ODM database.)


The reason for this problem is below:

Injecting CMVC Defect/Feature: 642271
Analysis text: In 5.4.1.2, the fix for defect 642271/APAR IZ21214
is introduced to differentiate/add granularity to the various
errors which can occur in cl_migcheck ANY. Now, the error 1
indicates a migration is in progress, so do not run automatic
cluster verification/synchronization. An error 3 indicates
either socket, socket connection, or read communication error
in clcomd, so continue running automatic cluster
verification/synchronization. However this change will print
error message like "migcheck[471]: cl_connect() error,
nodename=akash22, rc=-1". This additional error message is
causing additional issues while loading smit menu.

At this point you've got 3 choices:
1. wait until APAR IZ29033 comes out (no ETA for it).
2. modify /usr/es/sbin/cluster/utilities/clRGmove

Change the line:
cl_migcheck ANY

To read:
cl_migcheck ANY 2>/dev/null

Then re-issue the C-SPOC command via smit and it will work (you will be able to bring the resource group online).

3. run clRGmove manually at the command line to bring the RG ONLINE:

/usr/es/sbin/cluster/utilities/clRGmove -s 'false' -u -i -g 'RGname' -n 'nodeX' (where RGname=your RG and nodeX=HACMP node name)
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP