免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 7572 | 回复: 5
打印 上一主题 下一主题

solaris双机,最近有一台总是自动关机,怎么处理,我是新手 [复制链接]

论坛徽章:
2
操作系统版块每日发帖之星
日期:2016-02-07 06:20:00数据库技术版块每日发帖之星
日期:2016-02-13 06:20:00
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2015-04-17 08:47 |只看该作者 |倒序浏览
solaris双机,最近有一台总是自动关机,是不是看/var/adm/messages里的信息?

论坛徽章:
2
操作系统版块每日发帖之星
日期:2016-02-07 06:20:00数据库技术版块每日发帖之星
日期:2016-02-13 06:20:00
2 [报告]
发表于 2015-04-17 08:49 |只看该作者
Apr 16 06:33:53 Ipaddress00 bge: [ID 801593 kern.notice] NOTICE: bge3: link down
Apr 16 06:33:53 Ipaddress00 in.routed[648]: [ID 238047 daemon.warning] interface bge3 to 172.16.1.1 turned off
Apr 16 06:33:53 Ipaddress00 cf_drv: [ID 973785 kern.info]  LOG3.014291372331080024   1007 5    0    1.0         cf:eventlog     CF: (TRACE): Link is DOWN for device: /dev/bge3.
(#0000 0x000cc00f)
Apr 16 06:33:53 Ipaddress00 cf_drv: [ID 951747 kern.notice]  LOG3.014291372331080024   1019 4    0    1.0         cf:eventlog     CF: Problem detected on cluster interconnect /dev/bge3 to node pw450: missing heartbeat replies. (#0000 0 2 1 1)
Apr 16 06:33:55 Ipaddress00 cf_drv: [ID 687926 kern.info]  LOG3.014291372351080024   1007 5    0    1.0         cf:eventlog     CF: (TRACE): pw450: detected as a questionable node.
Apr 16 06:34:00 Ipaddress00 cf_drv: [ID 951652 kern.notice]  LOG3.014291372401080024   1019 4    0    1.0         cf:eventlog     CF: Problem detected on cluster interconnect /dev/bge3 to node pw450: missing heartbeat replies. (#0000 0 2 1 1)
Apr 16 06:34:00 Ipaddress00 cf_drv: [ID 167041 kern.info]  LOG3.014291372401080024   1007 5    0    1.0         cf:eventlog     CF: (TRACE): CFSF failure detected: no SF open: passed to ENS: pw450. (#0000 2)
Apr 16 06:34:00 Ipaddress00 cf_drv: [ID 226980 kern.notice]  LOG3.014291372401080024   1015 5    0    1.0         cf:eventlog     CF: Node pw450 Left Cluster ZSD. (#0000 2)
Apr 16 06:34:01 Ipaddress00  : [ID 748625 daemon.error] LOG3.014291372411080023   11   3    0    4.2         RMS              (BM, 113): ERROR: Base monitor has reported 'Faulted' for host <pw450RMS>.
Apr 16 06:34:01 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291372411080023   11   5    0    4.2         RMS              (US, 12): NOTICE: Cluster host pw450RMS has become Faulted. A shut down request will be sent immediately!
Apr 16 06:34:01 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291372411080023   11   5    0    4.2         RMS              (SYS, 9): NOTICE: Attempting to shut down the cluster host pw450RMS by invoking a Shutdown Facility via (sdtool -k pw450).
Apr 16 06:34:01 Ipaddress00  : [ID 748625 daemon.error] LOG3.014291372411080023   4    3    0    4.2         RMS              (SCR, 20): ERROR: The attempt to shut down the cluster host pw450RMS has failed: Exited with a non-zero code: 60
Apr 16 06:34:01 Ipaddress00  : [ID 748625 daemon.error] LOG3.014291372411080023   4    3    0    4.2         RMS              .
Apr 16 06:34:01 Ipaddress00  : [ID 748625 daemon.error] LOG3.014291372411080023   11   3    0    4.2         RMS              (SYS, : ERROR: RMS failed to shut down the host pw450RMS via a Shutdown Facility, no further kill functionality is available. The cluster is now hung.
Apr 16 06:37:34 Ipaddress00 in.routed[648]: [ID 464608 daemon.error] route 172.16.0.0 --> 172.16.1.1 nexthop is not directly connected
Apr 16 07:42:31 Ipaddress00 in.routed[648]: [ID 464608 daemon.error] route 172.16.0.0 --> 172.16.1.1 nexthop is not directly connected
Apr 16 08:47:28 Ipaddress00 in.routed[648]: [ID 464608 daemon.error] route 172.16.0.0 --> 172.16.1.1 nexthop is not directly connected
Apr 16 09:52:24 Ipaddress00 in.routed[648]: [ID 464608 daemon.error] route 172.16.0.0 --> 172.16.1.1 nexthop is not directly connected
Apr 16 10:57:21 Ipaddress00 in.routed[648]: [ID 464608 daemon.error] route 172.16.0.0 --> 172.16.1.1 nexthop is not directly connected
Apr 16 12:02:16 Ipaddress00 in.routed[648]: [ID 464608 daemon.error] route 172.16.0.0 --> 172.16.1.1 nexthop is not directly connected
Apr 16 13:07:13 Ipaddress00 in.routed[648]: [ID 464608 daemon.error] route 172.16.0.0 --> 172.16.1.1 nexthop is not directly connected
Apr 16 14:12:10 Ipaddress00 in.routed[648]: [ID 464608 daemon.error] route 172.16.0.0 --> 172.16.1.1 nexthop is not directly connected
Apr 16 15:16:02 Ipaddress00 bge: [ID 801593 kern.notice] NOTICE: bge3: link up 1000Mbps Full-Duplex
Apr 16 15:16:08 Ipaddress00 bge: [ID 801593 kern.notice] NOTICE: bge3: link down
Apr 16 15:16:08 Ipaddress00 in.routed[648]: [ID 238047 daemon.warning] interface bge3 to 172.16.1.1 turned off
Apr 16 15:16:11 Ipaddress00 bge: [ID 801593 kern.notice] NOTICE: bge3: link up 1000Mbps Full-Duplex
Apr 16 15:16:11 Ipaddress00 in.routed[648]: [ID 300549 daemon.warning] interface bge3 to 172.16.1.1 restored
Apr 16 15:19:29 Ipaddress00 bge: [ID 801593 kern.notice] NOTICE: bge3: link down
Apr 16 15:19:29 Ipaddress00 in.routed[648]: [ID 238047 daemon.warning] interface bge3 to 172.16.1.1 turned off
Apr 16 15:19:31 Ipaddress00 bge: [ID 801593 kern.notice] NOTICE: bge3: link up 1000Mbps Full-Duplex
Apr 16 15:19:32 Ipaddress00 in.routed[648]: [ID 300549 daemon.warning] interface bge3 to 172.16.1.1 restored
Apr 16 15:20:06 Ipaddress00 cf_drv: [ID 993657 kern.notice]  LOG3.014291688061080024   1015 5    0    1.0         cf:eventlog     CF: Node pw450 Left Cluster ZSD. (#0000 2)
Apr 16 15:20:06 Ipaddress00 cf_drv: [ID 442616 kern.notice]  LOG3.014291688061080024   1005 5    0    1.0         cf:eventlog     CF: ZSD: pw450 is Down. (#0000 2)
Apr 16 15:20:06 Ipaddress00 cf_drv: [ID 748609 kern.notice]  LOG3.014291688061080024   100014    0    1.0         cf:elmlog       !rebuild starting due to node failure
Apr 16 15:20:06 Ipaddress00
Apr 16 15:20:06 Ipaddress00 cf_drv: [ID 748609 kern.notice]  LOG3.014291688061080024   100014    0    1.0         cf:elmlog       !rebuild complete in 0 lbolt.
Apr 16 15:20:06 Ipaddress00
Apr 16 15:20:06 Ipaddress00 cf_drv: [ID 748609 kern.notice]  LOG3.014291688061080024   100014    0    1.0         cf:elmlog       ELM:0001: cluster size 1
Apr 16 15:20:06 Ipaddress00
Apr 16 15:20:07 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688071080023   11   5    0    4.2         RMS              (US, : NOTICE: Cluster host pw450RMS has been successfully killed.
Apr 16 15:20:17 Ipaddress00 cf_drv: [ID 770392 kern.info]  LOG3.014291688171080024   1007 5    0    1.0         cf:eventlog     CF: (TRACE): Link is UP for device: /dev/bge3. (#0000 0)
Apr 16 15:20:17 Ipaddress00 cf_drv: [ID 793437 kern.notice]  LOG3.014291688171080024   1004 5    0    1.0         cf:eventlog     CF: Node pw450 Joined Cluster ZSD. (#0000 2)
Apr 16 15:20:17 Ipaddress00 cf_drv: [ID 748609 kern.notice]  LOG3.014291688171080024   100014    0    1.0         cf:elmlog       !rebuild starting due to node joining configuration
Apr 16 15:20:17 Ipaddress00
Apr 16 15:20:17 Ipaddress00 cf_drv: [ID 748609 kern.notice]  LOG3.014291688171080024   100014    0    1.0         cf:elmlog       !rebuild complete in 1 lbolt.
Apr 16 15:20:17 Ipaddress00
Apr 16 15:20:17 Ipaddress00 cf_drv: [ID 748609 kern.notice]  LOG3.014291688171080024   100014    0    1.0         cf:elmlog       ELM:0001: cluster size 2
Apr 16 15:20:17 Ipaddress00
Apr 16 15:21:03 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688631080023   0    5    0    4.2         RMS              (WRP, 37): NOTICE: The package parameters of the package <SMAWRrms> on the remote host <pw450RMS> are: Version = <4.2A00>, Load = <51>.
Apr 16 15:21:03 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688631080023   0    5    0    4.2         RMS              (WRP, 3: NOTICE: The Process Id (pid) and the startup time of the RMS monitor on the remote host <pw450RMS> are <3884> and <2015-04-16_15:21:00>.
Apr 16 15:21:03 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688631080023   0    5    0    4.2         RMS              (WRP, 37): NOTICE: The package parameters of the package <SMAWRhvto> on the remote host <pw450RMS> are: Version = <4.2A00>, Load = <09>.
Apr 16 15:21:08 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688681080023   11   5    0    4.2         RMS              (WRP, 63): NOTICE: The ELM heartbeat started for the cluster host <pw450RMS>.
Apr 16 15:21:08 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688681080023   25   5    0    4.2         RMS              (BM, 64): NOTICE: Checksum request has been sent to host <pw450RMS>.
Apr 16 15:21:08 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688681080023   11   5    0    4.2         RMS              (BM, 61): NOTICE: A checksum verification request
has arrived from host <pw450RMS>, that host's checksum is <32065>.
Apr 16 15:21:08 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688681080023   25   5    0    4.2         RMS              (BM, 62): NOTICE: The local checksum <32065> has been replied back to host <pw450RMS>.
Apr 16 15:21:08 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688681080023   11   5    0    4.2         RMS              (BM, 63): NOTICE: Host <pw450RMS> has replied the
checksum <32065> equal to the local checksum. That host should become online now.
Apr 16 15:21:08 Ipaddress00  : [ID 748625 daemon.notice] LOG3.014291688681080023   11   5    0    4.2         RMS              (US, 9): NOTICE: Cluster host pw450RMS has become
online.

论坛徽章:
1
2015年辞旧岁徽章
日期:2015-03-03 16:54:15
3 [报告]
发表于 2015-04-17 10:01 |只看该作者
scstat

估计是心跳网络断了,然后把备节点重启。

论坛徽章:
0
4 [报告]
发表于 2015-04-18 18:02 |只看该作者
应该是的 。。。。

论坛徽章:
0
5 [报告]
发表于 2015-09-07 17:09 |只看该作者
Apr 16 06:34:00 Ipaddress00 cf_drv: [ID 951652 kern.notice]  LOG3.014291372401080024   1019 4    0    1.0         cf:eventlog     CF: Problem detected on cluster interconnect /dev/bge3 to node pw450: missing heartbeat replies. (#0000 0 2 1 1)
报的bge3的心跳丢失,重启是因为HA的保护机制。

论坛徽章:
1
2015年迎新春徽章
日期:2015-03-04 09:54:45
6 [报告]
发表于 2015-09-22 13:49 |只看该作者
心跳丢失。。。请检查心跳设置。。。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP