免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2455 | 回复: 3
打印 上一主题 下一主题

redhat as4 u5 配置hp cluster 问题 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2009-05-30 12:46 |只看该作者 |倒序浏览
使用红帽子cluster软件配置双机后,fence_ilo -a (ilo ip addr) -l (username) -p (password) -o (off on status reboot)
off 和status执行后结果正常,但reboot 或on 会报错
Command without TOGGLE="Yes" attribute is ignored when host power is off.
power_on: unexpected error,求解

论坛徽章:
0
2 [报告]
发表于 2009-06-01 08:55 |只看该作者

回复 #1 jihua2005 的帖子

内置fence的现象

论坛徽章:
1
数据库技术版块每日发帖之星
日期:2016-03-10 06:20:00
3 [报告]
发表于 2009-06-01 11:48 |只看该作者

回复 #1 jihua2005 的帖子

个人建议将ilo的firmware升级到最新。

论坛徽章:
1
数据库技术版块每日发帖之星
日期:2016-03-10 06:20:00
4 [报告]
发表于 2009-06-01 11:53 |只看该作者

回复 #1 jihua2005 的帖子

我搜了些文章,似乎早起版本有bug,你参考下:

Reporter:   Shane Bradley
Created:   Thu, 21 Feb 2008 12:26:53
Updated:   Thu, 16 Oct 2008 05:52:37
Key:   433864
Versions:   Not provided
Environment:   i686
Priority:   3
Status:   In Progress
Resolution:   Not provided
Original Link:   https://bugzilla.redhat.com/show_bug.cgi?id=433864
Summary:   RHEL5u1 /sbin/fence_ilo fails with 'failed to turn on'

Description:
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12)

Gecko/20080208 Fedora/2.0.0.12-1.fc8 Firefox/2.0.0.12

Description of problem:
RHEL 5u1 with latest updates to all packages.
HP ilo firmware: 1.91 (tested 1.89 and get same results)
ACPI is disabled at kernel level and runlevel.

When a machine that has a fence device of type HP ilo or HP ilo2 the

machine
will not fence correctly when it is a member of the cluster. If the machine

is
not a member of the cluster, then fencing will work correctly.

The commands ran when node is member of cluster:
$ fence_ilo -a 10.10.56.56 -l Administrator -p redhatrules -o reboot
$ fence_node bl20pg2-1.gsslab.rdu.redhat.com

Node's fence defination takes default which is reboot.
<device name="ilo1"/>

The fence will fail with an error: failed to turn on

/var/log/messages will contain messages such as this below:
Feb 20 16:19:04 cluster2-3 fenced[3664]: agent "fence_ilo" reports: failed

to
turn on
Feb 20 16:19:04 cluster2-3 fenced[3664]: fence "bl20pg2-

1.gsslab.rdu.redhat.com"
failed
Feb 20 16:19:09 cluster2-3 fenced[3664]: fencing node
"bl20pg2-1.gsslab.rdu.redhat.com"

The ilo event log shows that there is an extra "OFF" in the fence action.

What
actually occurs is "OFF,ON,OFF" and when fence device requests status it is
returned "OFF" which is true because of that extra "OFF" that is ran.

------------------
ILO event log
------------------
*EXTRA off
Caution iLO 02/19/2008 15:48 02/19/2008 15:48 1 Host server powered OFF by:
*Extra off
Administrator. Informational iLO 02/19/2008 15:48 02/19/2008 15:48 4 XML
logout: Administrator. Informational iLO 02/19/2008 15:48 02/19/2008 15:48

1
Host server powered ON by: Administrator. Informational iLO 02/19/2008

15:48
02/19/2008 15:48 4 XML login: Administrator - 10.16.252.176 Caution iLO
02/19/2008 15:47 02/19/2008 15:47 1 Host server powered OFF by:

Administrator.
Informational iLO 02/19/2008 15:47 02/19/2008 15:47 4 XML logout:

Administrator.
Informational iLO 02/19/2008 15:47 02/19/2008 15:47 4 XML login:

Administrator
- 10.16.252.176

--------------------------------------------------------------------------

------

If I change the fencing section of cluster.conf so that instead of doing
reboot, it does off then on. It appears to work, however i
<device name="ilo1" action="off"/>
<device name="ilo1" action="on"/>

The fence will work, however it works with errors. It appears that the

first action "OFF" fails, however it marches on and tries "ON". On is

successful and the fence works. This would lead me to believe that "OFF" is

not returning the correct status or is mangled. This would explain the 2nd

"OFF" that happens in reboot action. The OFF is unsuccessful, then it tries

ON. ON works, but before its status is check OFF is retried. Which leads to

the power status stating that the "ON" failed.

Feb 21 14:58:54 cluster2-3 ccsd[3639]: process_get: Invalid connection
descriptor received.
Feb 21 14:58:54 cluster2-3 ccsd[3639]: Error while processing get: Invalid
request descriptor
Feb 21 14:58:54 cluster2-3 ccsd[3639]: Attempt to close an unopened CCS
descriptor (558150).
Feb 21 14:58:54 cluster2-3 ccsd[3639]: Error while processing disconnect:
Invalid request descriptor
Feb 21 14:58:54 cluster2-3 fence_node[28117]: Fence of
"bl20pg2-1.gsslab.rdu.redhat.com" was unsuccessful
Feb 21 14:58:55 cluster2-3 dhclient: DHCPREQUEST on eth1 to 10.10.56.49

port 67
Feb 21 14:59:39 cluster2-3 last message repeated 3 times
Feb 21 14:59:48 cluster2-3 dhclient: DHCPREQUEST on eth1 to 10.10.56.49

port 67
Feb 21 14:59:48 cluster2-3 fenced[3664]: fence "bl20pg2-

1.gsslab.rdu.redhat.com"
success
Feb 21 14:59:53 cluster2-3 ccsd[3639]: Attempt to close an unopened CCS
descriptor (558390).
Feb 21 14:59:53 cluster2-3 ccsd[3639]: Error while processing disconnect:
Invalid request descriptor

---sbradley

Version-Release number of selected component (if applicable):
cman-2.0.73-1.el5_1.1

How reproducible:
Always


Steps to Reproduce:
1.setup rhel5u1 machine(with fence ilo/ilo2) with cman installed.
2.add the machine to the cluster
3.start cman on that node
4.fence_node <nodename>

Actual Results:
The node that was fenced fails to reboot.
Fails with error "failed to turn on"

Expected Results:
The node should have been hard rebooted by the ilo device.

Additional info:

FWIW, someone else reported this recently as well to me on IRC.

Hi, i'm heving this problem too.

[root@node1 ~]# fence_node node2.local
agent "fence_ilo" reports: failed to turn on

And then the node2.local is turned off.

acpid is turned off

iLO 2 Firmware Version: 1.43 12/12/2007

It started happen since i updated the version of rhel from 5.0 to 5.1

x86_64

Regards,
FabioSilva
fssilva@gmail.com


I have verified this workaround is valid below.
-----------------------------------------------
*Make sure you have turned off acpi at runlevel and kernel level.

$ cp /etc/cluster/cluster.conf /root
$ emacs /etc/cluster/cluster.conf

*Update version number by 1.
*Then edit the fence device section for each node for example:

<method name="1">
<device name="ilo1"/>
</method>

change to -->

<method name="1">
<device name="ilo1" action="off"/>
<device name="ilo1" action="on"/>
</method>

Do this for all cluster nodes, keep in mind the above is an example.
Please use values that are in your cluster.conf.

-------

After change is made and file is saved, update cluster with new

cluster.conf.
RHEL5
$ ccs_tool update /etc/cluster/cluster.conf
RHEL4
$ ccs_tool update /etc/cluster/cluster.conf
$ cman_tool version -r <version number>

Then run this command to verify changes are made and they have new version

number.
$ cman_tool status

Restart cluster services on both nodes(I believe a restart of fenced is

needed).
Test your fencing with fence_node command:
$ fence_node <node name>

Edited summary to match actual error message.

Fixed in commit 227fd2259db164351f4a87df11f0aaca7e8e8431
Comments:
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP