免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 7053 | 回复: 7
打印 上一主题 下一主题

CRS问题,导致主机宕机的问题,帮忙看一下 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-07-13 10:22 |只看该作者 |倒序浏览
两台物理的E6900, Oracle ASM 数据库,11.1.0.7.0
13:31分的时候,db2宕机,我看系统信息和CRS日志,感觉像是CRS引起系统宕机的,请帮忙看一下,非常感谢各位大侠了

DB1% crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    db1         
ora....B1.lsnr application    ONLINE    ONLINE    db1         
ora.db1.gsd    application    ONLINE    ONLINE    db1         
ora.db1.ons    application    ONLINE    ONLINE    db1         
ora.db1.vip    application    ONLINE    ONLINE    db1         
ora....SM2.asm application    ONLINE    OFFLINE               
ora....B2.lsnr application    ONLINE    OFFLINE               
ora.db2.gsd    application    ONLINE    OFFLINE               
ora.db2.ons    application    ONLINE    OFFLINE               
ora.db2.vip    application    ONLINE    ONLINE    db1         
ora.ora11g.db  application    ONLINE    ONLINE    db1         
ora....g1.inst application    ONLINE    ONLINE    db1         
ora....g2.inst application    ONLINE    OFFLINE   

         
DB1% tail alertdb1.log
2010-11-06 01:13:09.750
[crsd(1966)]CRS-1204:Recovering CRS resources for node db2.
2010-11-06 02:12:42.112
[cssd(218]CRS-1601:CSSD Reconfiguration complete. Active nodes are db1 db2 .
2011-07-12 15:31:24.651
[cssd(218]CRS-1607:CSSD evicting node db2. Details in /oracle/crs/log/db1/cssd/ocssd.log.
2011-07-12 15:31:24.675
[cssd(218]CRS-1601:CSSD Reconfiguration complete. Active nodes are db1 .
2011-07-12 15:31:41.841
[crsd(1966)]CRS-1204:Recovering CRS resources for node db2.

2011-07-12 15:31:41.732: [    RACG][1] [142][1][ora.db2.vip]: IP:172.18.3.44 is already up in the network (host=DB1)
Waiting for IP to come Down : Waited 1 of 30 seconds
Waiting for IP to come Down : Waited 2 of 30 seconds
Waiting for IP to come Down : Waited 3 of 30 seconds

2011-07-12 15:31:41.744: [    RACG][1] [142][1][ora.db2.vip]: Waiting for IP to come Down : Waited 4 of 30 seconds
Waiting for IP to come Down : Waited 5 of 30 seconds
Waiting for IP to come Down : Waited 6 of 30 seconds
Waiting for IP to come Down : Waited 7 of 30 seconds

2011-07-12 15:31:41.744: [    RACG][1] [142][1][ora.db2.vip]: Waiting for IP to come Down : Waited 8 of 30 seconds
Waiting for IP to come Down : Waited 9 of 30 seconds
Waiting for IP to come Down : Waited 10 of 30 seconds
Waiting for IP to come Down : Waited 11 of 30 seconds
IP:172.18.3.44 now down in the network

2011-07-12 15:31:41.744: [    RACG][1] [142][1][ora.db2.vip]: ifconfig: SIOCSLIFNAME for ip: ce0: already exists
Created new logical interface ce0:3

[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmSetMinMaxVersion: node 1  product/protocol (11.1/1.4)
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,13
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmSetMinMaxVersion: min product/protocol (11.1/1.4)
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmSetMinMaxVersion: max product/protocol (11.1/1.4)
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmNeedConfReq: No configuration to change
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmDoSyncUpdate: Terminating node 2, db2, misstime(866) state(5)
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmSetupAckWait: Ack message type (13)
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmDoSyncUpdate: Wait for 0 vote ack(s)
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmCheckDskInfo: Checking disk info...
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmEvict: Start
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmEvict: Evicting node 2, db2, birth 120736764, death 120736765, impendingrcfg
1, stateflags 0x8001
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmEvict: Not fencing node 2, db2, with SAGE, supported flag 0
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmSendShutdown: req to node 2, kill time 1165733520
[    CSSD]2011-07-12 15:31:24.651 [12] >ERROR:   clssnmSendShutdown: Send to node 2 failed, rc 5
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmWaitOnEvictions: Start
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmWaitOnEvictions: node 2, undead 1, killreqid  0, SAGE handle 0
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmCheckKillStatus: Node 2, db2, down, LATS(1504073642),timeout(-338340122)
[    CSSD]2011-07-12 15:31:24.651 [1] >TRACE:   clssgmQueueGrockEvent: groupName(DG_DISKGROUP10) count(2) master(1) event(2), incarn
13, mbrc 2, to member 0, events 0x4, state 0x0
[    CSSD]2011-07-12 15:31:24.651 [12] >TRACE:   clssnmSetupAckWait: Ack message type (15)


Tue Jul 12 15:32:04 2011
ERROR: Instance termination initiated by instance 1 with reason 1.
ERROR: This comes as a down reconfig event from the cluster manager.
ERROR: Please check instance 1's alert and LMON trace file for detail.
ERROR: Please also examine the CSS log files.
LMON (ospid: 8880): terminating the instance due to error 481


Tue Jul 12 15:32:04 2011
ORA-1092 : opitsk aborting process
System State dumped to trace file /oracle/db/diag/rdbms/ora11g/ora11g2/trace/ora11g2_diag_8852.trc

Tue Jul 12 19:49:12 2011
Starting ORACLE instance (normal)
Specified value of sga_max_size is too small, bumping to 10733223936



ora11g1:
Reconfiguration started (old inc 32, new inc 34)
List of nodes:
0
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Tue Jul 12 15:31:25 2011
Trace dumping is performing id=[cdmp_20110712153204]
Non-local Process blocks cleaned out
Tue Jul 12 15:31:25 2011
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Jul 12 15:31:25 2011
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Jul 12 15:31:25 2011
LMS 2: 1 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Tue Jul 12 15:31:26 2011
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
LMS 1: 157443 GCS shadows traversed, 0 replayed
LMS 0: 157093 GCS shadows traversed, 0 replayed
LMS 2: 156564 GCS shadows traversed, 0 replayed
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
parallel recovery started with 16 processes
Started redo scan
Tue Jul 12 15:31:36 2011
Completed redo scan
67952 redo blocks read, 6898 data blocks need recovery
Started redo application at
Thread 2: logseq 43526, block 195965
Recovery of Online Redo Log: Thread 2 Group 11 Seq 43526 Reading mem 0
  Mem# 0: +DISKGROUP4/ora11g/onlinelog/group_11_logfile1
  Mem# 1: +DISKGROUP2/ora11g/onlinelog/group_11_logfile2
Recovery of Online Redo Log: Thread 2 Group 4 Seq 43527 Reading mem 0
  Mem# 0: +DISKGROUP2/ora11g/onlinelog/group_4.269.669385297
  Mem# 1: +DISKGROUP4/ora11g/onlinelog/group_4.263.669385299
Completed redo application of 18.80MB
Completed instance recovery at
Thread 2: logseq 43527, block 5026, scn 12257375052531
6022 data blocks read, 8623 data blocks written, 67952 redo blocks read
Thread 2 advanced to log sequence 43528 (thread recovery)
Redo thread 2 internally disabled at seq 43528 (SMON)
Tue Jul 12 15:31:42 2011
Thread 1 advanced to log sequence 58240 (LGWR switch)
  Current log# 5 seq# 58240 mem# 0: +DISKGROUP2/ora11g/onlinelog/group_5.259.669383465
  Current log# 5 seq# 58240 mem# 1: +DISKGROUP4/ora11g/onlinelog/group_5.259.669383467
Tue Jul 12 15:32:16 2011
Archived Log entry 190375 added for thread 2 sequence 43527 ID 0xfffffffff2532fdd dest 1:
Archived Log entry 190376 added for thread 2 sequence 43527 ID 0xfffffffff2532fdd dest 2:
Tue Jul 12 15:32:40 2011
Errors in file /oracle/db/diag/rdbms/ora11g/ora11g1/trace/ora11g1_ping_3446.trc

ϵͳ´íÎóÈÕÖ¾:

Jul 12 15:32:04 DB2 last message repeated 4 times
Jul 12 15:32:05 DB2 root: [ID 702911 user.error] Oracle clsomon failed with fatal status 12.
Jul 12 15:32:05 DB2 root: [ID 702911 user.alert] Oracle CRS failure.  Rebooting for cluster integrity.
Jul 12 15:32:05 DB2 root: [ID 702911 user.alert] Oracle CSSD failure 134.
Jul 12 15:32:05 DB2 root: [ID 702911 user.alert] Oracle CRS failure.  Rebooting for cluster integrity.
Jul 12 15:32:07 DB2 root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.


lastÆô¶¯ÈÕÖ¾
monitor   pts/1        132.96.63.75     Tue Jul 12 19:47 - 19:47  (00:00)
reboot    system boot                   Tue Jul 12 19:46
reboot    system down                   Tue Jul 12 15:32
monitor   pts/2        132.96.63.75     Tue Jul 12 15:30 - 15:30  (00:00)

DB2µÄalertÈÕÖ¾
[    CSSD]2011-07-12 15:32:04.631 [8] >TRACE:   clssgmDiscOmonReady: omon was posted for member 2
[    CSSD]2011-07-12 15:32:04.631 [8] >ERROR:   clssnmvDiskKillCheck: Aborting, evicted by node 1, sync 120736766, stamp 0
[    CSSD]2011-07-12 15:32:04.631 [15] >TRACE:   clssgmDestroyClient: client (100a00890) cleaned up con (100a00ad0), refcount 0, joi
nstate 6, deadstate 3
[    CSSD]2011-07-12 15:32:04.631 [8] >TRACE:   clssgmDiscOmonReady: omon was posted for member 2
[    CSSD]2011-07-12 15:32:04.631 [15] >TRACE:   clssgmclientlsnr: entering select
[    CSSD]2011-07-12 15:32:04.631 [8] >ERROR:   ###################################
[    CSSD]2011-07-12 15:32:04.631 [8] >ERROR:   clssscExit: CSSD aborting from thread clssnmvKillBlockThread0
[    CSSD]2011-07-12 15:32:04.631 [8] >ERROR:   ###################################

论坛徽章:
59
2015七夕节徽章
日期:2015-08-24 11:17:25ChinaUnix专家徽章
日期:2015-07-20 09:19:30每周论坛发贴之星
日期:2015-07-20 09:19:42ChinaUnix元老
日期:2015-07-20 11:04:38荣誉版主
日期:2015-07-20 11:05:19巳蛇
日期:2015-07-20 11:05:26CU十二周年纪念徽章
日期:2015-07-20 11:05:27IT运维版块每日发帖之星
日期:2015-07-20 11:05:34操作系统版块每日发帖之星
日期:2015-07-20 11:05:36程序设计版块每日发帖之星
日期:2015-07-20 11:05:40数据库技术版块每日发帖之星
日期:2015-07-20 11:05:432015年辞旧岁徽章
日期:2015-07-20 11:05:44
2 [报告]
发表于 2011-07-13 10:49 |只看该作者
集群问题。高手来解答吧。

论坛徽章:
0
3 [报告]
发表于 2011-07-14 13:28 |只看该作者
没人知道么,帮忙分析一下,不能沉的太快

论坛徽章:
0
4 [报告]
发表于 2011-07-14 14:00 |只看该作者
ora....SM2.asm application    ONLINE    OFFLINE               
ora....B2.lsnr application    ONLINE    OFFLINE               
ora.db2.gsd    application    ONLINE    OFFLINE               
ora.db2.ons    application    ONLINE    OFFLINE   

2011-07-12 15:31:41.732: [    RACG][1] [142][1][ora.db2.vip]: IP:172.18.3.44 is already up in the network (host=DB1)
Waiting for IP to come Down : Waited 1 of 30 seconds
Waiting for IP to come Down : Waited 2 of 30 seconds
Waiting for IP to come Down : Waited 3 of 30 seconds

2011-07-12 15:31:41.744: [    RACG][1] [142][1][ora.db2.vip]: Waiting for IP to come Down : Waited 4 of 30 seconds
Waiting for IP to come Down : Waited 5 of 30 seconds
Waiting for IP to come Down : Waited 6 of 30 seconds
Waiting for IP to come Down : Waited 7 of 30 seconds

2011-07-12 15:31:41.744: [    RACG][1] [142][1][ora.db2.vip]: Waiting for IP to come Down : Waited 8 of 30 seconds
Waiting for IP to come Down : Waited 9 of 30 seconds
Waiting for IP to come Down : Waited 10 of 30 seconds
Waiting for IP to come Down : Waited 11 of 30 seconds
IP:172.18.3.44 now down in the network

如果以前正常使用过,现在可ASM,监听等都断了,而且网络等待超时。你还是好好检查一下物理连接吧。

论坛徽章:
0
5 [报告]
发表于 2011-07-14 14:09 |只看该作者
本帖最后由 tacsoft 于 2011-07-14 14:15 编辑

2011-07-12 15:31:41.732: [    RACG][1] [142][1][ora.db2.vip]: IP:172.18.3.44 is already up in the network (host=DB1)
Waiting for IP to come Down : Waited 1 of 30 seconds
Waiting for IP to come Down : Waited 2 of 30 seconds
Waiting for IP to come Down : Waited 3 of 30 seconds

2011-07-12 15:31:41.744: [    RACG][1] [142][1][ora.db2.vip]: Waiting for IP to come Down : Waited 4 of 30 seconds

IP:172.18.3.44 这个接口起来了,ora.db2.vip的接口没有起来啊,等待超时了。
先检查一下物理接口和虚拟(VIP)接口,线路等。然后再考虑别的问题。

上回回答了一个RAC的问题,还画了张网络拓扑,跑出来一帮“很懂Oracle”的人,说这个坛子不能说网络的事。所以懒得说了,其实昨天就看见你这个帖子了。

论坛徽章:
0
6 [报告]
发表于 2011-07-14 14:29 |只看该作者
ora....SM2.asm application    ONLINE    OFFLINE               
ora....B2.lsnr application    ONLINE    OFFLINE               
ora.db2.gsd    application    ONLINE    OFFLINE               
ora.db2.ons    application    ONLINE    OFFLINE                     
ora....g2.inst application    ONLINE    OFFLINE   

我仔细看了一下,所有的OFFLINE都在2号机上,而且连接超时,轻的情况是那边连接ASM或者监听进程死了,严重点可能是网络的问题。

仅供参考,哈哈!

论坛徽章:
0
7 [报告]
发表于 2011-09-26 10:03 |只看该作者
结贴,内存硬件问题导致,换后,目前服务启动正常,多谢

论坛徽章:
0
8 [报告]
发表于 2011-10-01 16:26 |只看该作者
very good thread....
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP