- 论坛徽章:
- 0
|
两台物理的E6900, Oracle ASM 数据库,11.1.0.7.0
13:31分的时候,db2宕机,我看系统信息和CRS日志,感觉像是CRS引起系统宕机的,请帮忙看一下,非常感谢各位大侠了
DB1% crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE db1
ora....B1.lsnr application ONLINE ONLINE db1
ora.db1.gsd application ONLINE ONLINE db1
ora.db1.ons application ONLINE ONLINE db1
ora.db1.vip application ONLINE ONLINE db1
ora....SM2.asm application ONLINE OFFLINE
ora....B2.lsnr application ONLINE OFFLINE
ora.db2.gsd application ONLINE OFFLINE
ora.db2.ons application ONLINE OFFLINE
ora.db2.vip application ONLINE ONLINE db1
ora.ora11g.db application ONLINE ONLINE db1
ora....g1.inst application ONLINE ONLINE db1
ora....g2.inst application ONLINE OFFLINE
DB1% tail alertdb1.log
2010-11-06 01:13:09.750
[crsd(1966)]CRS-1204:Recovering CRS resources for node db2.
2010-11-06 02:12:42.112
[cssd(218]CRS-1601:CSSD Reconfiguration complete. Active nodes are db1 db2 .
2011-07-12 15:31:24.651
[cssd(218]CRS-1607:CSSD evicting node db2. Details in /oracle/crs/log/db1/cssd/ocssd.log.
2011-07-12 15:31:24.675
[cssd(218]CRS-1601:CSSD Reconfiguration complete. Active nodes are db1 .
2011-07-12 15:31:41.841
[crsd(1966)]CRS-1204:Recovering CRS resources for node db2.
2011-07-12 15:31:41.732: [ RACG][1] [142][1][ora.db2.vip]: IP:172.18.3.44 is already up in the network (host=DB1)
Waiting for IP to come Down : Waited 1 of 30 seconds
Waiting for IP to come Down : Waited 2 of 30 seconds
Waiting for IP to come Down : Waited 3 of 30 seconds
2011-07-12 15:31:41.744: [ RACG][1] [142][1][ora.db2.vip]: Waiting for IP to come Down : Waited 4 of 30 seconds
Waiting for IP to come Down : Waited 5 of 30 seconds
Waiting for IP to come Down : Waited 6 of 30 seconds
Waiting for IP to come Down : Waited 7 of 30 seconds
2011-07-12 15:31:41.744: [ RACG][1] [142][1][ora.db2.vip]: Waiting for IP to come Down : Waited 8 of 30 seconds
Waiting for IP to come Down : Waited 9 of 30 seconds
Waiting for IP to come Down : Waited 10 of 30 seconds
Waiting for IP to come Down : Waited 11 of 30 seconds
IP:172.18.3.44 now down in the network
2011-07-12 15:31:41.744: [ RACG][1] [142][1][ora.db2.vip]: ifconfig: SIOCSLIFNAME for ip: ce0: already exists
Created new logical interface ce0:3
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmSetMinMaxVersion: node 1 product/protocol (11.1/1.4)
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmSetMinMaxVersion: properties common to all nodes: 1,2,3,4,5,6,7,8,9,10,13
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmSetMinMaxVersion: min product/protocol (11.1/1.4)
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmSetMinMaxVersion: max product/protocol (11.1/1.4)
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmNeedConfReq: No configuration to change
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmDoSyncUpdate: Terminating node 2, db2, misstime(866) state(5)
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmDoSyncUpdate: Wait for 0 vote ack(s)
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmEvict: Start
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmEvict: Evicting node 2, db2, birth 120736764, death 120736765, impendingrcfg
1, stateflags 0x8001
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmEvict: Not fencing node 2, db2, with SAGE, supported flag 0
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmSendShutdown: req to node 2, kill time 1165733520
[ CSSD]2011-07-12 15:31:24.651 [12] >ERROR: clssnmSendShutdown: Send to node 2 failed, rc 5
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmWaitOnEvictions: node 2, undead 1, killreqid 0, SAGE handle 0
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmCheckKillStatus: Node 2, db2, down, LATS(1504073642),timeout(-338340122)
[ CSSD]2011-07-12 15:31:24.651 [1] >TRACE: clssgmQueueGrockEvent: groupName(DG_DISKGROUP10) count(2) master(1) event(2), incarn
13, mbrc 2, to member 0, events 0x4, state 0x0
[ CSSD]2011-07-12 15:31:24.651 [12] >TRACE: clssnmSetupAckWait: Ack message type (15)
Tue Jul 12 15:32:04 2011
ERROR: Instance termination initiated by instance 1 with reason 1.
ERROR: This comes as a down reconfig event from the cluster manager.
ERROR: Please check instance 1's alert and LMON trace file for detail.
ERROR: Please also examine the CSS log files.
LMON (ospid: 8880): terminating the instance due to error 481
Tue Jul 12 15:32:04 2011
ORA-1092 : opitsk aborting process
System State dumped to trace file /oracle/db/diag/rdbms/ora11g/ora11g2/trace/ora11g2_diag_8852.trc
Tue Jul 12 19:49:12 2011
Starting ORACLE instance (normal)
Specified value of sga_max_size is too small, bumping to 10733223936
ora11g1:
Reconfiguration started (old inc 32, new inc 34)
List of nodes:
0
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Tue Jul 12 15:31:25 2011
Trace dumping is performing id=[cdmp_20110712153204]
Non-local Process blocks cleaned out
Tue Jul 12 15:31:25 2011
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Jul 12 15:31:25 2011
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Tue Jul 12 15:31:25 2011
LMS 2: 1 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Tue Jul 12 15:31:26 2011
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
LMS 1: 157443 GCS shadows traversed, 0 replayed
LMS 0: 157093 GCS shadows traversed, 0 replayed
LMS 2: 156564 GCS shadows traversed, 0 replayed
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
parallel recovery started with 16 processes
Started redo scan
Tue Jul 12 15:31:36 2011
Completed redo scan
67952 redo blocks read, 6898 data blocks need recovery
Started redo application at
Thread 2: logseq 43526, block 195965
Recovery of Online Redo Log: Thread 2 Group 11 Seq 43526 Reading mem 0
Mem# 0: +DISKGROUP4/ora11g/onlinelog/group_11_logfile1
Mem# 1: +DISKGROUP2/ora11g/onlinelog/group_11_logfile2
Recovery of Online Redo Log: Thread 2 Group 4 Seq 43527 Reading mem 0
Mem# 0: +DISKGROUP2/ora11g/onlinelog/group_4.269.669385297
Mem# 1: +DISKGROUP4/ora11g/onlinelog/group_4.263.669385299
Completed redo application of 18.80MB
Completed instance recovery at
Thread 2: logseq 43527, block 5026, scn 12257375052531
6022 data blocks read, 8623 data blocks written, 67952 redo blocks read
Thread 2 advanced to log sequence 43528 (thread recovery)
Redo thread 2 internally disabled at seq 43528 (SMON)
Tue Jul 12 15:31:42 2011
Thread 1 advanced to log sequence 58240 (LGWR switch)
Current log# 5 seq# 58240 mem# 0: +DISKGROUP2/ora11g/onlinelog/group_5.259.669383465
Current log# 5 seq# 58240 mem# 1: +DISKGROUP4/ora11g/onlinelog/group_5.259.669383467
Tue Jul 12 15:32:16 2011
Archived Log entry 190375 added for thread 2 sequence 43527 ID 0xfffffffff2532fdd dest 1:
Archived Log entry 190376 added for thread 2 sequence 43527 ID 0xfffffffff2532fdd dest 2:
Tue Jul 12 15:32:40 2011
Errors in file /oracle/db/diag/rdbms/ora11g/ora11g1/trace/ora11g1_ping_3446.trc
ϵͳ´íÎóÈÕÖ¾:
Jul 12 15:32:04 DB2 last message repeated 4 times
Jul 12 15:32:05 DB2 root: [ID 702911 user.error] Oracle clsomon failed with fatal status 12.
Jul 12 15:32:05 DB2 root: [ID 702911 user.alert] Oracle CRS failure. Rebooting for cluster integrity.
Jul 12 15:32:05 DB2 root: [ID 702911 user.alert] Oracle CSSD failure 134.
Jul 12 15:32:05 DB2 root: [ID 702911 user.alert] Oracle CRS failure. Rebooting for cluster integrity.
Jul 12 15:32:07 DB2 root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.
lastÆô¶¯ÈÕÖ¾
monitor pts/1 132.96.63.75 Tue Jul 12 19:47 - 19:47 (00:00)
reboot system boot Tue Jul 12 19:46
reboot system down Tue Jul 12 15:32
monitor pts/2 132.96.63.75 Tue Jul 12 15:30 - 15:30 (00:00)
DB2µÄalertÈÕÖ¾
[ CSSD]2011-07-12 15:32:04.631 [8] >TRACE: clssgmDiscOmonReady: omon was posted for member 2
[ CSSD]2011-07-12 15:32:04.631 [8] >ERROR: clssnmvDiskKillCheck: Aborting, evicted by node 1, sync 120736766, stamp 0
[ CSSD]2011-07-12 15:32:04.631 [15] >TRACE: clssgmDestroyClient: client (100a00890) cleaned up con (100a00ad0), refcount 0, joi
nstate 6, deadstate 3
[ CSSD]2011-07-12 15:32:04.631 [8] >TRACE: clssgmDiscOmonReady: omon was posted for member 2
[ CSSD]2011-07-12 15:32:04.631 [15] >TRACE: clssgmclientlsnr: entering select
[ CSSD]2011-07-12 15:32:04.631 [8] >ERROR: ###################################
[ CSSD]2011-07-12 15:32:04.631 [8] >ERROR: clssscExit: CSSD aborting from thread clssnmvKillBlockThread0
[ CSSD]2011-07-12 15:32:04.631 [8] >ERROR: ################################### |
|