免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1627 | 回复: 0

春节后处理的第一个RAC故障--1 [复制链接]

论坛徽章:
0
发表于 2011-12-19 13:56 |显示全部楼层
    春节的假期里接到客户客户的电话,曰:主机重启后,RAC一个也起不来(一个4节点的RAC,两个满配的570+两个半配的570).一台主机启动很慢很慢,一台主机报错,四个节点竟然2个节点报硬件错误!幸好今年春节在魔都过,简单的了解了一下情况,火速赶往现场,路上联系主机工程师,NND在魔都的工程师只有一人并且是转销售去了的,估计不会来,电话找公司安排主机工程师,竟然无人接电话,无果,打公司800电话,TMD还是无人接,看来TMD什么7×24啊,什么800,都TMD是浮云,接单之前吹得天花乱坠,有事的时候又找不到人,找到了又安排一个新手去,TMD还不如我这个业余的去处理好了.......要不是跟客户熟,客户早就发飙了.....好了,牢骚发完了处理问题吧.....
    硬件不熟,还是先检查RAC为啥起不来,检查crsd进程的log:
2011-02-07 15:03:03.869: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2011-02-07 15:03:05.254: [ COMMCRS][351]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs))

2011-02-07 15:03:05.254: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

2011-02-07 15:03:05.256: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2011-02-07 15:03:06.590: [ COMMCRS][353]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs))

2011-02-07 15:03:06.590: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

2011-02-07 15:03:06.590: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2011-02-07 15:03:07.973: [ COMMCRS][355]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs))

发现是cssd没起来,继续检查cssd的日志,发现一些信息:
 [    CSSD]2011-02-07 15:13:08.415 >node3:    Copyright 2011, Oracle version 10.2.0.4.0
[    CSSD]2011-02-07 15:13:08.415 >node3:    CSS daemon log for node node1, number 1, in cluster crs
[    CSSD]2011-02-07 15:13:08.421 [1] >TRACE:   clssscmain: local-only set to false
[    CSSD]2011-02-07 15:13:08.427 [1] >TRACE:   clssnmReadNodeInfo: added node 1 (node1) to cluster
[    CSSD]2011-02-07 15:13:08.431 [1] >TRACE:   clssnmReadNodeInfo: added node 2 (node2) to cluster
[    CSSD]2011-02-07 15:13:08.436 [1] >TRACE:   clssnmReadNodeInfo: added node 3 (node3) to cluster
[  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=node1DBG_CSSD))
[    CSSD]2011-02-07 15:13:08.441 [1] >TRACE:   clssnmReadNodeInfo: added node 4 (node4) to cluster
[    CSSD]2011-02-07 15:13:08.444 [1] >TRACE:   clssgmInitCMInfo: Wait for remote node termination set to 805306368 seconds
[    CSSD]2011-02-07 15:13:08.446 [1029] >TRACE:   clssnm_skgxninit: Compatible vendor clusterware not in use
[    CSSD]2011-02-07 15:13:08.446 [1029] >TRACE:   clssnm_skgxnmon: skgxn init failed
[    CSSD]2011-02-07 15:13:08.447 [1] >TRACE:   clssnmNMInitialize: misscount set to (30)
[    CSSD]2011-02-07 15:13:08.448 [1] >TRACE:   clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 15000 ms, reconfig start (misscount) 30000 ms
[    CSSD]2011-02-07 15:13:08.451 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (0//dev/voting1)
[    CSSD]2011-02-07 15:13:08.452 [1030] >TRACE:   clssnmvDPT: spawned for disk 0 (/dev/voting1)
[    CSSD]2011-02-07 15:13:08.453 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (1//dev/voting2)
[    CSSD]2011-02-07 15:13:08.453 [1287] >TRACE:   clssnmvDPT: spawned for disk 1 (/dev/voting2)
[    CSSD]2011-02-07 15:13:08.455 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (2//dev/voting3)
[    CSSD]2011-02-07 15:13:08.455 [1544] >TRACE:   clssnmvDPT: spawned for disk 2 (/dev/voting3)
[    CSSD]2011-02-07 15:13:10.464 [1030] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (0//dev/voting1)
[    CSSD]2011-02-07 15:13:10.464 [1801] >TRACE:   clssnmvKillBlockThread: spawned for disk 0 (/dev/voting1) initial sleep interval (1000)ms
[    CSSD]2011-02-07 15:13:10.464 [1030] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(13) wrtcnt(604) LATS(4844712) Disk lastSeqNo(604)
[    CSSD]2011-02-07 15:13:10.464 [1030] >TRACE:   clssnmReadDskHeartbeat: node(3) is down. rcfg(11) wrtcnt(604) LATS(4844712) Disk lastSeqNo(604)
[    CSSD]2011-02-07 15:13:10.464 [1030] >TRACE:   clssnmReadDskHeartbeat: node(4) is down. rcfg(14) wrtcnt(3085) LATS(4844712) Disk lastSeqNo(3085)
[    CSSD]2011-02-07 15:13:10.481 [1544] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (2//dev/voting3)
[    CSSD]2011-02-07 15:13:10.481 [2058] >TRACE:   clssnmvKillBlockThread: spawned for disk 2 (/dev/voting3) initial sleep interval (1000)ms
[    CSSD]2011-02-07 15:13:10.481 [1544] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(13) wrtcnt(604) LATS(4844729) Disk lastSeqNo(604)
[    CSSD]2011-02-07 15:13:10.481 [1544] >TRACE:   clssnmReadDskHeartbeat: node(3) is down. rcfg(11) wrtcnt(605) LATS(4844729) Disk lastSeqNo(605)
[    CSSD]2011-02-07 15:13:10.487 [1287] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (1//dev/voting2)
[    CSSD]2011-02-07 15:13:10.487 [2315] >TRACE:   clssnmvKillBlockThread: spawned for disk 1 (/dev/voting2) initial sleep interval (1000)ms
[    CSSD]2011-02-07 15:13:10.488 [1] >TRACE:   clssnmFatalInit: fatal mode enabled
[    CSSD]2011-02-07 15:13:10.500 [2829] >TRACE:   clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=node1-priv)(PORT=49895))

[    CSSD]2011-02-07 15:13:10.500 [2829] >TRACE:   clssnmClusterListener: Probing node node2 (2), probcon(1113fa5d0)
[    CSSD]2011-02-07 15:13:10.500 [2829] >TRACE:   clssnmClusterListener: Probing node node3 (3), probcon(11156db50)
[    CSSD]2011-02-07 15:13:10.501 [2829] >TRACE:   clssnmClusterListener: Probing node node4 (4), probcon(111570730)
[    CSSD]2011-02-07 15:13:10.501 [2829] >TRACE:   clssnmDiscHelper: node2, node(2) connection failed, con (1113fa5d0), probe(1113fa5d0)

只发现“clssnm_skgxnmon: skgxn init failed”这样的错误,在metalink上查了一下,发现没啥可以参考的结果,其实这个日志里一个重要的信息被我忽略了:[    CSSD]2011-02-07 17:29:53.412 [1] >TRACE:   clssgmInitCMInfo: Wait for remote node termination set to 805306368 seconds,这导致我花了很多时间去检查日志,重启主机,在我ps -ef|grep d.bin的时候也忽略了oprocd进程的参数值。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP