10.2.0.5 RAC on Linux上的一个Bug
系统表现出来的是:“mmon进程lock住了一些sys的对象,然后这个进程的cpu使用率会到100%”<br>做了debug后,trace文件的内容如下:<br><br>*** ACTION NAME:(Remote-Flush Slave Action) 2011-10-25 20:00:08.996<br>*** MODULE NAME:(MMON_SLAVE) 2011-10-25 20:00:08.996<br>*** SERVICE NAME:(SYS$BACKGROUND) 2011-10-25 20:00:08.996<br>*** SESSION ID:(2553.18657) 2011-10-25 20:00:08.996<br><b>WARNING:io_submit failed due to kernel limitations MAXAIO for process=0 pending aio=0<br>WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=65483<br>WARNING:1 Oracle process running out of OS kernelI/O resources aiolimit=0 </b><br>ksfdgo()+1488<-ksfdaio1()+9848<-kfkUfsIO()+594<-kfkDoIO()+631<-kfkIOPriv()+616<-kfdIOPriv()+95<-kfioSubmitIO()+503<-kfioRequestPriv()+166<-kfioRequest()+689<-ksfd_osmgo()+1286<-ksfdgo()+1488<-ksfdaio1()+9848<-ksfqwr()+335<-kcflfi()+670<-kcvrsz()+1131<-ktfbfcsz()+657<br><-ktfbfxtnd()+237<-ktfbtgex1()+2461<-ktsxs_add()+1480<-ktspnr_next()+1206<-ktr***ec()+437<-ktspbmphwm()+1229<-ktspmvhwm()+49<-ktsp_bump_hwm()+191<-ktspgsp_cbk()+983<-kdisnew()+304<-kdisnewle()+125<-kdisle()+4556<-kdiins0()+26993<-kauxsin()+3965<-insidx()+2509<br><-insflush()+466<-insrow()+933<-insdrv()+589<-inscovexe()+399<-in***ecStmtExecIniEngine()+85<-in***e()+384<-opiexe()+9334<-kpoal8()+2295<-opiodr()+1184<-kpoodrc()+38<-rpiswu2()+409<-kpoodr()+554<-upirtrc()+2101<-kpurcsc()+125<-kpuexecv8()+1705<-kpuexec()+2643<br><-OCIStmtExecute()+41ssd_unwind_bp: unhandled instruction at 0x14fdbdf instr=6a<br>ssd_unwind_bp: unhandled instruction at 0x14fc333 instr=68<br><-kewrose_oci_stmt_exec()+62<-kewrgwxf1_gwrsql_exft_1()+284<-kewrgwxf_gwrsql_exft()+451<-kewrews_execute_wr_sql()+52<-kewrftbs_flush_table_by_sql()+188<-kewrft_flush_table()+223<-kewrftec_flush_table_ehdlcx()+805<-kewrfat_flush_all_tables()+1243<-kewrfsr_flush_snapshot_r()+173<br><-kewrrfs_remote_flush_slave()+1002<-kebm_slave_main()+221<-ksvrdp()+1159<-opirip()+748<-opidrv()+583<-sou2o()+114<-opimai_real()+317<-main()+116<-__libc_start_main()+219<-_start()+42<br>*** 2011-10-25 23:20:17.038<br>ssd_unwind_bp: unhandled instruction at 0x14fdbdf instr=6a<br>ssd_unwind_bp: unhandled instruction at 0x14fc333 instr=68<br>*** 2011-10-26 08:48:54.726<br>Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 1591, image: <br>*** 2011-10-26 08:48:54.726<br>ksedmp: internal or fatal error<br>Current SQL statement for this session:<br>insert into wrh$_sysstat (snap_id, dbid, instance_number, stat_id, value) select :snap_id, :dbid, :instance_number, stat_id, value from v$sysstat order by stat_id<br>----- Call Stack Trace -----<br>calling call entry argument values in hex <br>location type point (? means dubious value) <br>-------------------- -------- -------------------- ----------------------------<br>ksedst()+31 call ksedst1() 000000000 ? 000000001 ?<br> 7FBFFD6590 ? 7FBFFD65F0 ?<br> 7FBFFD6530 ? 000000000 ?<br>ksedmp()+610 call ksedst() 000000000 ? 000000001 ?<br> 7FBFFD6590 ? 7FBFFD65F0 ?<br> 7FBFFD6530 ? 000000000 ?<br>ksdxfdmp()+1153 call ksedmp() 000000003 ? 000000001 ?<br> 7FBFFD6590 ? 7FBFFD65F0 ?<br> 7FBFFD6530 ? 000000000 ?<br><br>看到前面加粗的部分就知道个大概了,AIO不足,<br>session的等待表现为:<br>SO: 0x159d85068, type: 4, owner: 0x15f94e478, flag: INIT/-/-/0x00<br> (session) sid: 2553 trans: (nil), creator: 0x15f94e478, flag: (100051) USR/- BSY/-/-/-/-/-<br> DID: 0002-02E5-00000030, short-term DID: 0000-0000-00000000<br> txn branch: (nil)<br> oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS<br> service name: SYS$BACKGROUND<br> last wait for 'Data file init write' wait_time=0.000016 sec, seconds since wait started=46124<br> count=1, intr=100, timeout=ffffffff<br> blocking sess=0x(nil) seq=224<br> Dumping Session Wait History<br> for 'Data file init write' count=1 wait_time=0.000016 sec<br> count=1, intr=100, timeout=ffffffff<br> for 'Data file init write' count=1 wait_time=0.000016 sec<br> count=1, intr=100, timeout=ffffffff<br> for 'Data file init write' count=1 wait_time=0.000035 sec<br> count=1, intr=100, timeout=ffffffff<br> for 'Data file init write' count=1 wait_time=0.614215 sec<br> count=1, intr=100, timeout=ffffffff<br> for 'CSS operation: action' count=1 wait_time=0.000080 sec<br> function_id=41, =0, =0<br> for 'CSS initialization' count=1 wait_time=0.000004 sec<br>解决问题的办法也很简单:<br>增加fs.aio-max-nr 的值,比如本例中增加到fs.aio-max-nr = 1048576即可以解决该问题,<br>参考metalink :<font face="helvetica"><strong>1313555.1、</strong></font><font face="helvetica"><strong>9949948.8</strong></font><br>这个问题归属于一个<font face="helvetica"><strong>Bug: 9949948 </strong></font><br><font face="helvetica"><strong><br></strong></font><br>
页:
[1]