- 论坛徽章:
- 0
|
系统表现出来的是:“mmon进程lock住了一些sys的对象,然后这个进程的cpu使用率会到100%” 做了debug后,trace文件的内容如下:
*** ACTION NAME:(Remote-Flush Slave Action) 2011-10-25 20:00:08.996 *** MODULE NAME:(MMON_SLAVE) 2011-10-25 20:00:08.996 *** SERVICE NAME:(SYS$BACKGROUND) 2011-10-25 20:00:08.996 *** SESSION ID:(2553.18657) 2011-10-25 20:00:08.996 WARNING:io_submit failed due to kernel limitations MAXAIO for process=0 pending aio=0 WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=65483 WARNING:1 Oracle process running out of OS kernelI/O resources aiolimit=0 ksfdgo()+1488<-ksfdaio1()+9848<-kfkUfsIO()+594<-kfkDoIO()+631<-kfkIOPriv()+616<-kfdIOPriv()+95<-kfioSubmitIO()+503<-kfioRequestPriv()+166<-kfioRequest()+689<-ksfd_osmgo()+1286<-ksfdgo()+1488<-ksfdaio1()+9848<-ksfqwr()+335<-kcflfi()+670<-kcvrsz()+1131<-ktfbfcsz()+657 <-ktfbfxtnd()+237<-ktfbtgex1()+2461<-ktsxs_add()+1480<-ktspnr_next()+1206<-ktr***ec()+437<-ktspbmphwm()+1229<-ktspmvhwm()+49<-ktsp_bump_hwm()+191<-ktspgsp_cbk()+983<-kdisnew()+304<-kdisnewle()+125<-kdisle()+4556<-kdiins0()+26993<-kauxsin()+3965<-insidx()+2509 <-insflush()+466<-insrow()+933<-insdrv()+589<-inscovexe()+399<-in***ecStmtExecIniEngine()+85<-in***e()+384<-opiexe()+9334<-kpoal8()+2295<-opiodr()+1184<-kpoodrc()+38<-rpiswu2()+409<-kpoodr()+554<-upirtrc()+2101<-kpurcsc()+125<-kpuexecv8()+1705<-kpuexec()+2643 <-OCIStmtExecute()+41ssd_unwind_bp: unhandled instruction at 0x14fdbdf instr=6a ssd_unwind_bp: unhandled instruction at 0x14fc333 instr=68 <-kewrose_oci_stmt_exec()+62<-kewrgwxf1_gwrsql_exft_1()+284<-kewrgwxf_gwrsql_exft()+451<-kewrews_execute_wr_sql()+52<-kewrftbs_flush_table_by_sql()+188<-kewrft_flush_table()+223<-kewrftec_flush_table_ehdlcx()+805<-kewrfat_flush_all_tables()+1243<-kewrfsr_flush_snapshot_r()+173 <-kewrrfs_remote_flush_slave()+1002<-kebm_slave_main()+221<-ksvrdp()+1159<-opirip()+748<-opidrv()+583<-sou2o()+114<-opimai_real()+317<-main()+116<-__libc_start_main()+219<-_start()+42 *** 2011-10-25 23:20:17.038 ssd_unwind_bp: unhandled instruction at 0x14fdbdf instr=6a ssd_unwind_bp: unhandled instruction at 0x14fc333 instr=68 *** 2011-10-26 08:48:54.726 Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 1591, image: *** 2011-10-26 08:48:54.726 ksedmp: internal or fatal error Current SQL statement for this session: insert into wrh$_sysstat (snap_id, dbid, instance_number, stat_id, value) select :snap_id, :dbid, :instance_number, stat_id, value from v$sysstat order by stat_id ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 7FBFFD6590 ? 7FBFFD65F0 ? 7FBFFD6530 ? 000000000 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 7FBFFD6590 ? 7FBFFD65F0 ? 7FBFFD6530 ? 000000000 ? ksdxfdmp()+1153 call ksedmp() 000000003 ? 000000001 ? 7FBFFD6590 ? 7FBFFD65F0 ? 7FBFFD6530 ? 000000000 ?
看到前面加粗的部分就知道个大概了,AIO不足, session的等待表现为: SO: 0x159d85068, type: 4, owner: 0x15f94e478, flag: INIT/-/-/0x00 (session) sid: 2553 trans: (nil), creator: 0x15f94e478, flag: (100051) USR/- BSY/-/-/-/-/- DID: 0002-02E5-00000030, short-term DID: 0000-0000-00000000 txn branch: (nil) oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS service name: SYS$BACKGROUND last wait for 'Data file init write' wait_time=0.000016 sec, seconds since wait started=46124 count=1, intr=100, timeout=ffffffff blocking sess=0x(nil) seq=224 Dumping Session Wait History for 'Data file init write' count=1 wait_time=0.000016 sec count=1, intr=100, timeout=ffffffff for 'Data file init write' count=1 wait_time=0.000016 sec count=1, intr=100, timeout=ffffffff for 'Data file init write' count=1 wait_time=0.000035 sec count=1, intr=100, timeout=ffffffff for 'Data file init write' count=1 wait_time=0.614215 sec count=1, intr=100, timeout=ffffffff for 'CSS operation: action' count=1 wait_time=0.000080 sec function_id=41, =0, =0 for 'CSS initialization' count=1 wait_time=0.000004 sec 解决问题的办法也很简单: 增加fs.aio-max-nr 的值,比如本例中增加到fs.aio-max-nr = 1048576即可以解决该问题, 参考metalink :1313555.1、9949948.8 这个问题归属于一个Bug: 9949948
|
|