免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3751 | 回复: 9

[高级应用] ha V5.2心跳问题 [复制链接]

论坛徽章:
0
发表于 2009-01-04 11:51 |显示全部楼层
ha V5.1版本以上支持IP的心跳方式,在配置里面是缺省使用的一种方式么?
2个节点配置了磁盘心跳,但是测试失败,节点之间收不到信号,但ha能起来,并且状态正常,现在不知道是磁盘心跳还是ha用了自己的缺省机制,请大家帮忙分析下,谢谢!
提供日志如下:
=======================================================
  Node: aix150            State: UP
           Interface: aix150_stb (1)            Address: 172.16.100.2
                                                State:   UP
           Interface: aix150 (1)                Address: 192.168.1.2
                                                State:   UP
           Interface: diskhb_aix150 (0)         Address: 0.0.0.0
                                                State:   UP
           Interface: aix150_svc (1)            Address: 192.168.2.2
                                                State:   UP
           Resource Group: cl_gr2                       State:  On line

        Node: aix170            State: UP
           Interface: aix170_stb (1)            Address: 172.16.100.4
                                                State:   UP
           Interface: aix170 (1)                Address: 192.168.1.4
                                                State:   UP
           Interface: diskhb_aix170 (0)         Address: 0.0.0.0
                                                State:   UP
           Interface: aix170_svc (1)            Address: 192.168.2.4
                                                State:   UP
=========================================================
root:/>lssrc -ls topsvcs|more
Subsystem         Group            PID     Status
topsvcs          topsvcs          30818   active
Network Name   Indx Defd  Mbrs  St   Adapter ID      Group ID
net_ether_01_0 [ 0] 2     2     S    172.16.100.4    172.16.100.4   
net_ether_01_0 [ 0] en1              0x4160115a      0x416014e4
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent    : 6827 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 9578 ICMP 0 Dropped: 0
NIM's PID: 29524
net_ether_01_1 [ 1] 2     2     S    192.168.1.4     192.168.1.4   
net_ether_01_1 [ 1] en0              0x41601458      0x416014e4
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent    : 6692 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 9527 ICMP 0 Dropped: 0
NIM's PID: 29238
diskhb_0       [ 2] 2     2     S    255.255.10.1    255.255.10.1   
diskhb_0       [ 2] rhdisk6          0x81601159      0x816014e8
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 11 Current group: 11
Packets sent    : 3200 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 3030 ICMP 0 Dropped: 0
NIM's PID: 29164
  2 locally connected Clients with PIDs:
haemd( 28430) hagsd( 28904)
  Dead Man Switch Enabled:
     reset interval = 1 seconds
     trip  interval = 20 seconds
  Client Heartbeating Disabled.
  Configuration Instance = 55
  Daemon employs no security
  Segments pinned: Text Data.
  Text segment size: 809 KB. Static data segment size: 1520 KB.
  Dynamic data segment size: 3853. Number of outstanding malloc: 175
  User time 0 sec. System time 0 sec.
  Number of page faults: 259. Process swapped out 0 times.
  Number of nodes up: 2. Number of nodes down: 0.
============================================

论坛徽章:
0
发表于 2009-01-04 12:48 |显示全部楼层
Interface: diskhb_aix150 (0)
从这里看应该是磁盘心跳,
我记得vg应该选用增强的并发模式。

论坛徽章:
0
发表于 2009-01-04 12:52 |显示全部楼层
原帖由 fck 于 2009-1-4 12:48 发表
Interface: diskhb_aix150 (0)
从这里看应该是磁盘心跳,
我记得vg应该选用增强的并发模式。

增强并发的没有问题,就这块现在还悬着,不能确定。
节点1:
=======================
root:/var/ha/log>cat nim.topsvcs.rhdisk6.cluster52
01/04 09:31:04.693: send_thread_main() starting.
01/04 09:31:04.714: receive_thread_main() starting.
01/04 09:31:04.714: NIM_comm creation succeeded. UD Socket name: /var/ha/soc/topsvcs/hats_diskhb_nim.29164
01/04 09:31:04.714: Current stack location: 0x2ff22680.
01/04 09:31:04.714: Current stack limit: 262144 Max 2147483646
01/04 09:31:04.728: Pinning kernel extension already loaded.
01/04 09:31:04.734: Shared libraries pinned: total 4377780 bytes.
01/04 09:31:04.734: Allocating and warming 128000 bytes of pinned memory.
01/04 09:31:05.546: Received an OPEN command
01/04 09:31:05.547: Minimum packet size:    76
01/04 09:31:05.547: Maximum packet size:    16536
01/04 09:31:05.547: Number of send retries: 4
01/04 09:31:05.547: Local address: (device) /dev/rhdisk6
01/04 09:31:05.559: open(): Wrote initial handshake
01/04 09:31:05.559: Response successfully sent.
01/04 09:31:05.560: Received HA_NIM_DMS_START command.
01/04 09:31:05.813: Local adapter is now up.
01/04 09:31:05.813: Adapter status successfully sent.
01/04 09:31:05.815: Received a SEND MSG command. Dst: .
01/04 09:31:13.180: Received a SEND MSG command. Dst: .
01/04 09:31:20.554: Received a SEND MSG command. Dst: .
01/04 09:31:26.312: Received a SEND MSG command. Dst: .
01/04 09:46:07.964: Received a SEND MSG command. Dst: .
01/04 09:46:14.724: dhb_hs_probe_timer_fct(): version 02  link 0  lr1 00000  lr2 00000
                ID1 000E16814C00-1      ID2
01/04 09:46:14.737: dhb_hs_probe_timer_fct(): Wrote full handshake
01/04 09:46:16.663: Received a SEND MSG command. Dst: .
01/04 09:46:18.525: Received a SEND MSG command. Dst: .
01/04 09:46:18.912: Received a SEND MSG command. Dst: .
01/04 09:46:19.791: Received a SEND MSG command. Dst: .
01/04 09:46:19.791: Received a START HB command. Destination: .
01/04 09:46:19.791: set_dhb_polling_rate(): Default poll speed 40
01/04 09:46:19.791: Received a SEND MSG command. Dst: .
01/04 09:46:19.792: Received a START MONITOR command.
01/04 09:46:19.792: Address:  How often: 4000 msec Sensitivity: 4 Configuration Instance: 55
01/04 09:46:19.792: Received a SEND MSG command. Dst: .
01/04 09:46:26.447: Received a SEND MSG command. Dst: .
01/04 09:46:33.556: Received a SEND MSG command. Dst: .
01/04 09:46:41.475: Received a SEND MSG command. Dst: .
节点2:
================================
root:/var/ha/log>cat nim.topsvcs.rhdisk5.cluster52
01/04 09:46:38.455: send_thread_main() starting.
01/04 09:46:38.456: receive_thread_main() starting.
01/04 09:46:38.457: NIM_comm creation succeeded. UD Socket name: /var/ha/soc/topsvcs/hats_diskhb_nim.18848
01/04 09:46:38.458: Current stack location: 0x2ff22790.
01/04 09:46:38.458: Current stack limit: 262144 Max 2147483646
01/04 09:46:38.472: Pinning kernel extension already loaded.
01/04 09:46:38.487: Shared libraries pinned: total 4355179 bytes.
01/04 09:46:38.487: Allocating and warming 128000 bytes of pinned memory.
01/04 09:46:39.298: Received an OPEN command
01/04 09:46:39.298: Minimum packet size:    76
01/04 09:46:39.298: Maximum packet size:    16536
01/04 09:46:39.298: Number of send retries: 4
01/04 09:46:39.298: Local address: (device) /dev/rhdisk5
01/04 09:46:39.313: open(): Wrote initial handshake
01/04 09:46:39.313: Response successfully sent.
01/04 09:46:39.314: Received HA_NIM_DMS_START command.
01/04 09:46:39.573: Local adapter is now up.
01/04 09:46:39.573: Adapter status successfully sent.
01/04 09:46:43.347: dhb_hs_probe_timer_fct(): version 02  link 1  lr1 00000  lr2 00000
                ID1 000E16814C00-1      ID2 000654DF4C00-1
01/04 09:46:43.358: dhb_hs_probe_timer_fct(): Wrote full handshake
01/04 09:46:45.284: Received a SEND MSG command. Dst: .
01/04 09:46:47.094: Received a SEND MSG command. Dst: .
01/04 09:46:48.152: Received a SEND MSG command. Dst: .
01/04 09:46:49.216: Received a START HB command. Destination: .
01/04 09:46:49.216: set_dhb_polling_rate(): Default poll speed 40
01/04 09:46:49.219: Received a SEND MSG command. Dst: .
01/04 09:46:49.219: Received a START MONITOR command.
01/04 09:46:49.219: Address:  How often: 4000 msec Sensitivity: 4 Configuration Instance: 55
01/04 09:46:49.220: Received a SEND MSG command. Dst: .
01/04 09:46:49.220: Received a SEND MSG command. Dst: .
01/04 09:49:41.731: Heartbeat was NOT received. Missed HBs: 1. Limit: 4
01/04 10:08:17.113: Heartbeat was NOT received. Missed HBs: 1. Limit: 4
01/04 10:32:13.423: Heartbeat was NOT received. Missed HBs: 1. Limit: 4
01/04 10:33:29.687: Heartbeat was NOT received. Missed HBs: 1. Limit: 4
01/04 10:39:38.833: Heartbeat was NOT received. Missed HBs: 1. Limit: 4
01/04 10:55:01.563: Heartbeat was NOT received. Missed HBs: 1. Limit: 4

论坛徽章:
0
发表于 2009-01-04 14:10 |显示全部楼层
Missed HBs: 1. Limit: 4 没全丢,小事
十几分钟丢一个,问题也不大,可以修改一下HA的网络属性,应该有个关于心跳发送速度的,调成slower再观察

论坛徽章:
0
发表于 2009-01-04 14:10 |显示全部楼层
停止HA的状态下,测试磁盘心跳都正常吗?

论坛徽章:
0
发表于 2009-01-04 14:41 |显示全部楼层
原帖由 meilixueshan 于 2009-1-4 14:10 发表
停止HA的状态下,测试磁盘心跳都正常吗?

=================================
aix170-root:/var/ha/log>/usr/sbin/rsct/bin/dhb_read -p /dev/hdisk6 -r
Receive Mode:
Waiting for response . . .
Magic number = -2023406815
Magic number = -2023406815
Magic number = -2023406815
Magic number = -2023406815
Magic number = -2023406815
No response on link.
就是这样的状态,但是日志却有连接正常的信息,我修改下参数看看,oslevel 5200-10

[ 本帖最后由 jat_15 于 2009-1-4 14:42 编辑 ]

论坛徽章:
0
发表于 2009-01-04 15:33 |显示全部楼层
/usr/sbin/rsct/bin/dhb_read -p /dev/hdiskN -r两个节点都执行看看接管,如果正常的话,那就修改HA中的网络参数

论坛徽章:
0
发表于 2009-01-04 15:43 |显示全部楼层
原帖由 meilixueshan 于 2009-1-4 15:33 发表
/usr/sbin/rsct/bin/dhb_read -p /dev/hdiskN -r两个节点都执行看看接管,如果正常的话,那就修改HA中的网络参数

=======================
两边都一样,接管没问题,两边测试都正常,对了你说的那个lower在哪修改,我看到的只有debug级别的修改,还有就是刷新频率和磁盘water的修改

[ 本帖最后由 jat_15 于 2009-1-4 15:52 编辑 ]

论坛徽章:
0
发表于 2009-01-04 16:50 |显示全部楼层
手头没环境,具体路径记不住
大概就是“网络module 的属性”部分的,里面有3个属性值:Fast、Normal、Slow,你找找看

论坛徽章:
0
发表于 2009-01-04 17:11 |显示全部楼层
原帖由 meilixueshan 于 2009-1-4 16:50 发表
手头没环境,具体路径记不住
大概就是“网络module 的属性”部分的,里面有3个属性值:Fast、Normal、Slow,你找找看

================================
谢谢,修改slow
Heartbeat was NOT received. Missed HBs: 1. Limit: 4 问题解决

[ 本帖最后由 jat_15 于 2009-1-4 17:27 编辑 ]
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP