Chinaunix

标题: 求教:db2实例宕掉 [打印本页]

作者: xray26    时间: 2007-03-22 09:14
标题: 求教:db2实例宕掉
最近早上上班经常发现实例在晚上宕掉,有dump生成,晚上只做一个三点的自动备份,望高手指点迷津!

DB2 V8.1 fix5
AIX  V5.1
HA   V4.5

下附db2diag.log内容。

[ 本帖最后由 xray26 于 2007-3-22 09:24 编辑 ]
作者: xray26    时间: 2007-03-22 09:14
2007-03-22-03.01.21.210565   Instance:db2inst1   Node:000
PID:48172(db2loggr (SHDATA) 0)   TID:1   Appid:none
data protection  sqlpghck Probe:1390

ExtNum 5288, state 401, baselsn 0000004BD3220000 nextlsn 0000004BD7249D15

2007-03-22-03.01.33.532974   Instance:db2inst1   Node:000
PID:43100(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.BE74.0D8F91190126
base sys utilities  sqle_remap_errors Probe:100   Database:SHDATA

ZRC 0x80040003 remapped to SQLCODE -1044

2007-03-22-03.01.33.600498   Instance:db2inst1   Node:000
PID:43100(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.BE74.0D8F91190126
base sys utilities  sqlesrsu Probe:140   Database:SHDATA

DIA8003C The interrupt  has been received.
ZRC=0x80040003
作者: xray26    时间: 2007-03-22 09:19
在日志中看到3点的自动备份操作正常啊,怎么在6点50多就宕了呢?
ADM7009E  An error was encountered in the "TCPIP" protocol support.  A possible cause is that the maximum number of agents has been exceeded.晚上没人用啊,怎么会超过最大代理数?
作者: xray26    时间: 2007-03-22 09:33
日志中到早上7点5分的时候产生了dump文件
……
2007-03-22-07.05.15.692359   Instance:db2inst1   Node:000
PID:24450(db2stop2)   TID:1   Appid:none
base sys utilities  sqleKillNode Probe:90

Sending sigdump and waiting for 1 min to complete

2007-03-22-07.05.16.485541   Instance:db2inst1   Node:000
PID:55142(db2agent 0)   TID:1   Appid:none
DRDA Application Server  sqljsSignalHandler Probe:10

DIA0505I Execution of a component signal handling function has begun.

2007-03-22-07.05.16.183814   Instance:db2inst1   Node:000
PID:45256(db2agent 0)   TID:1   Appid:none
DRDA Application Server  sqljsSignalHandler Probe:10

DIA0505I Execution of a component signal handling function has begun.

PID:45256 TID:1 Node:000 Title: **** DRDA ASCB ****
Dump File:/home/db2inst1/sqllib/db2dump/452561.000

PID:45256 TID:1 Node:000 Title: **** DRDA CMNMGR CB ****
Dump File:/home/db2inst1/sqllib/db2dump/452561.000

2007-03-22-07.05.17.072541   Instance:db2inst1   Node:000
PID:30298(db2agent 0)   TID:1   Appid:none
DRDA Application Server  sqljsSignalHandler Probe:10

DIA0505I Execution of a component signal handling function has begun.

PID:30298 TID:1 Node:000 Title: **** DRDA ASCB ****
Dump File:/home/db2inst1/sqllib/db2dump/302981.000

PID:30298 TID:1 Node:000 Title: **** DRDA CMNMGR CB ****
Dump File:/home/db2inst1/sqllib/db2dump/302981.000

PID:30298 TID:1 Node:000 Title: **** RECEIVE BUFFER ****
Dump File:/home/db2inst1/sqllib/db2dump/302981.000

PID:30298 TID:1 Node:000 Title: **** SEND BUFFERS ****


PID:30298 TID:1 Node:000 Title: **** CONNECTION HANDLE ****
Dump File:/home/db2inst1/sqllib/db2dump/302981.000

PID:30298 TID:1 Node:000 Title: **** DRDA ATTRIBUTES ****
Dump File:/home/db2inst1/sqllib/db2dump/302981.000

PID:30298 TID:1 Node:000 Title: **** AS UCINTERFACE ****
Dump File:/home/db2inst1/sqllib/db2dump/302981.000

2007-03-22-07.05.17.767869   Instance:db2inst1   Node:000
PID:30298(db2agent 0)   TID:1   Appid:none
DRDA Application Server  sqljsSignalHandler Probe:20

DIA0506I Execution of a component signal handling function is complete.
……
作者: Jens    时间: 2007-03-24 01:50
关注中,,,高手快来.
作者: huyuhui001    时间: 2007-03-24 07:32
lock情况呢?
ZRC=0x80040003 的意思是 USER INTERRUPT DETECTED
作者: xray26    时间: 2007-03-26 09:16
谢谢楼上回复!
看下边日志白天有锁升级的情况。
参数:LOCKLIST 5000   
         DLCHKTIME 10000  
         MAXLOCKS 20
         LOCKTIMEOUT 60
还有就是下边日志中中午出现的“errno: 0x2FF00E98 : 0x00000009”是什么意思?   

2007-03-21-09.58.11.885248   Instance:db2inst1   Node:000
PID:51960(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.OD94.085D61000105
data management  sqldEscalateLocks Probe:3   Database:SHDATA

ADM5502W  The escalation of "102402" locks on table "DB2INST1.CALL_LOG_LINE" to
lock intent "S" was successful.

2007-03-21-13.14.38.167198   Instance:db2inst1   Node:000
PID:50424(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.OD92.07DD21000051
data management  sqldEscalateLocks Probe:3   Database:SHDATA

ADM5502W  The escalation of "102401" locks on table "DB2INST1.CALL_LOG_LINE" to
lock intent "S" was successful.

2007-03-21-13.52.11.583564   Instance:db2inst1   Node:000
PID:21360(db2)   TID:1   Appid:none
oper system services  sqloclose Probe:110

errno:
0x2FF00E98 : 0x00000009                                 ....

2007-03-21-13.57.13.019356   Instance:db2inst1   Node:000
PID:26818(db2)   TID:1   Appid:none
oper system services  sqloclose Probe:110

errno:
0x2FF20D98 : 0x00000009                                 ....

2007-03-21-14.43.19.733558   Instance:db2inst1   Node:000
PID:26840(db2)   TID:1   Appid:none
oper system services  sqloclose Probe:110

errno:
0x2FF20D88 : 0x00000009                                 ....

2007-03-21-19.21.29.464752   Instance:db2inst1   Node:000
PID:29814(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.BCC2.0D7971105623
data management  sqldEscalateLocks Probe:3   Database:SHDATA

ADM5502W  The escalation of "27770" locks on table "DB2INST1.ITEM_SLSMAN_MONTH"
to lock intent "X" was successful.

2007-03-21-19.21.29.565837   Instance:db2inst1   Node:000
PID:29814(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.BCC2.0D7971105623
data management  sqldEscalateLocks Probe:3   Database:SHDATA

ADM5502W  The escalation of "26576" locks on table
"DB2INST1.ITEM_DPT_SALE_MONTH" to lock intent "X" was successful.

2007-03-21-19.24.28.435010   Instance:db2inst1   Node:000
PID:29814(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.BCC2.0D7971105623
data management  sqldEscalateLocks Probe:3   Database:SHDATA

ADM5502W  The escalation of "54349" locks on table
"DB2INST1.ITEM_SALESMAN_MONTH" to lock intent "X" was successful.

2007-03-21-19.31.06.525802   Instance:db2inst1   Node:000
PID:39322(db2agent (LOSHDATA) 0)   TID:1   Appid:*LOCAL.db2inst1.0C7BD1000225
data management  sqldEscalateLocks Probe:3   Database:SHDATA

ADM5502W  The escalation of "52070" locks on table "TSCD   
.CDITEM_SLSMAN_MONTH" to lock intent "X" was successful.

2007-03-21-19.31.19.781498   Instance:db2inst1   Node:000
PID:39322(db2agent (LOSHDATA) 0)   TID:1   Appid:*LOCAL.db2inst1.0C7BD1000225
data management  sqldEscalateLocks Probe:3   Database:SHDATA

ADM5502W  The escalation of "52070" locks on table "TSCD   
.CDITEM_SALESMAN_MONTH" to lock intent "X" was successful.
作者: xray26    时间: 2007-03-26 09:22
原帖由 huyuhui001 于 2007-3-24 07:32 发表
lock情况呢?
ZRC=0x80040003 的意思是 USER INTERRUPT DETECTED





ZRC=0x80040003 是不是由于自动备份脚本里的“db2 force applications all”命令所致呢?

[ 本帖最后由 xray26 于 2007-3-26 09:27 编辑 ]
作者: lizhuo    时间: 2007-03-26 16:13
db2 force applications all会杀掉所有当前数据库的应用。
作者: xray26    时间: 2007-03-27 09:55
原帖由 lizhuo 于 2007-3-26 16:13 发表
db2 force applications all会杀掉所有当前数据库的应用。


我知道呀,我的意思是ZRC=0x80040003 这个信息是不是由于执行了force applications 命令而产生的呢。
作者: lizhuo    时间: 2007-03-27 17:13
有可能,不完全是。
作者: xray26    时间: 2007-03-28 09:19
那实例不定期宕掉,是不是跟锁升级有关系呢,是不是需要调整一下那几个关于锁的参数?
作者: mymm    时间: 2007-03-28 11:00
看了一下您的DIAG,就该是如下步骤:

[quote]ExtNum 5288, state 401, baselsn 0000004BD3220000 nextlsn 0000004BD7249D15

2007-03-22-03.01.33.532974   Instance:db2inst1   Node:000
PID:43100(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.BE74.0D8F91190126
base sys utilities  sqle_remap_errors Probe:100   Database:SHDATA

ZRC 0x80040003 remapped to SQLCODE -1044

2007-03-22-03.01.33.600498   Instance:db2inst1   Node:000
PID:43100(db2agent (LOSHDATA) 0)   TID:1   Appid:GA398003.BE74.0D8F91190126
base sys utilities  sqlesrsu Probe:140   Database:SHDATA

DIA8003C The interrupt  has been received.
ZRC=0x80040003
作者: mymm    时间: 2007-03-28 11:01
原帖由 xray26 于 2007-3-22 09:19 发表
在日志中看到3点的自动备份操作正常啊,怎么在6点50多就宕了呢?
ADM7009E  An error was encountered in the "TCPIP" protocol support.  A possible cause is that the maximum number of agents ha ...



这个提示是STOP、START的正常提示?

在近七点时有没有应用连接DB呢?
作者: xray26    时间: 2007-03-28 11:27
回13楼

crontab里制定的是每天凌晨3点的数据库脱机备份,数据库备份映像文件我已经用db2ckbkp检查了,是可用的。

[ 本帖最后由 xray26 于 2007-3-28 13:58 编辑 ]
作者: xray26    时间: 2007-03-28 11:32
回14楼

一般是在7点30分左右开始进行业务的。那天早上发现连不上数据库了,而且start、 stop都不可用,管理员在7点左右把小机重起了。

[ 本帖最后由 xray26 于 2007-3-28 13:58 编辑 ]
作者: xray26    时间: 2007-03-28 11:38
今天凌晨0点30分有个TSM的数据库自动备份任务,备份成功了,但是之后3点的自动脱机备份没有成功,3点自动脱机备份的日志里提示:“SQL1032N  No start database manager command was issued.  SQLSTATE=57019
DB20000I  The TERMINATE command completed successfully.
SQL1032N  No start database manager command was issued.”
看db2diag应该是在做完0点30分的TSM数据库备份之后,3点自动脱机备份之前的这段时间(1点04分左右),实例宕掉了。。

下边是今天凌晨的db2diag.log:

2007-03-28-00.30.11.465737   Instance:db2inst1   Node:000
PID:55938(db2loggr (SHDATA) 0)   TID:1   Appid:none
data protection  sqlpghck Probe:1390

ExtNum 5340, state 401, baselsn 0000004C91900000 nextlsn 0000004C9550BA0A

2007-03-28-00.30.24.355217   Instance:db2inst1   Node:000
PID:51096(db2agent (LOSHDATA) 0)   TID:1   Appid:*LOCAL.db2inst1.0FD137163009
database utilities  sqlubcka Probe:0   Database:SHDATA

Starting a full database backup.

2007-03-28-01.02.51.525154   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
base sys utilities  sqleagnt_sigsegvh Probe:1

Error in agent servicing application with coor_node:
0x2FF1B04C : 0x0000                                     ..

2007-03-28-01.02.51.732177   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
base sys utilities  sqleagnt_sigsegvh Probe:2

Error in agent servicing application with coor_agent_index:
0x2FF1B04E : 0x0001                                     ..

2007-03-28-01.02.51.828163   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
base sys utilities  sqleagnt_sigsegvh Probe:3

Error in agent servicing application with CLIENT PID:
0x2FF1B054 : 0x30                                       0

2007-03-28-01.02.51.924171   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
base sys utilities  sqleagnt_sigsegvh Probe:4

Error in agent servicing application with INBOUND APPLICATION ID:
0x20239763 : 6462 3268 6D6F 6E                          db2hmon

2007-03-28-01.02.52.032175   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
base sys utilities  sqleagnt_sigsegvh Probe:5

Error in agent servicing application with INBOUND SEQUENCE NUMBER:
0x20239784 : 0x0000                                     ..

2007-03-28-01.02.52.116224   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
base sys utilities  sqleagnt_sigsegvh Probe:8

Error in agent servicing application with AUTHORIZATION ID:
0x30116E39 : 0000 0000 0000 0000 0000 0000 0000 0000    ................
0x30116E49 : 0000 0000 0000 0000 0000 0000 0000 00      ...............

2007-03-28-01.02.52.218166   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
base sys utilities  sqleagnt_sigsegvh Probe:9

Error in agent servicing application with PRODUCT SIGNATURE:

2007-03-28-01.02.52.286395   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
base sys utilities  sqleagnt_sigsegvh Probe:10

Error in agent servicing application with APPLICATION NAME:

2007-03-28-01.02.52.359640   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
oper system services  sqloEDUCodeTrapHandler Probe:10

ADM0503C  An unexpected internal processing error has occurred.  ALL DB2
PROCESSES ASSOCIATED WITH THIS INSTANCE HAVE BEEN SHUTDOWN.  Diagnostic
information has been recorded.  Contact IBM Support for further assistance.

2007-03-28-01.02.52.596376   Instance:db2inst1   Node:000
PID:25542(db2hmon 0)   TID:1   Appid:none
oper system services  sqloEDUCodeTrapHandler Probe:20

Signal number received

0x2FF1B100 : 0x0000000B                                 ....

PID:25542 TID:1 Node:000 Title: siginfo_t...

0x2FF1B3B0 : 0000 000B 0000 0000 0000 0033 0000 0000    ...........3....
0x2FF1B3C0 : 0000 0000 0000 0000 0000 0000 0000 0000    ................
0x2FF1B3D0 : 0000 0000 0000 0000 0000 0000 0000 0000    ................
0x2FF1B3E0 : 0000 0000 0000 0000 0000 0000 0000 0000    ................

PID:25542 TID:1 Node:000 Title: SQLE_AGENTCB
Dump File:/home/db2inst1/sqllib/db2dump/255421.000

PID:25542 TID:1 Node:000 Title: SQLE_AGENT_PRIVATECB
Dump File:/home/db2inst1/sqllib/db2dump/255421.000

PID:25542 TID:1 Node:000 Title: SQLE_MASTER_APP_CB
Dump File:/home/db2inst1/sqllib/db2dump/255421.000

PID:25542 TID:1 Node:000 Title: SQLE_APP_CB
Dump File:/home/db2inst1/sqllib/db2dump/255421.000

PID:25542 TID:1 Node:000 Title: SQLE_COORDINATOR_CB
Dump File:/home/db2inst1/sqllib/db2dump/255421.000

2007-03-28-01.03.44.599230   Instance:db2inst1   Node:000
PID:16866(db2gds 0)   TID:1   Appid:none
oper system services  sqloEDUSIGCHLDHandler Probe:50

Detected the death of an EDU with process id 25542
The signal number that terminated this process was 11
Look for trap files (t25542.*) in the dump directory


2007-03-28-01.03.45.262033   Instance:db2inst1   Node:000
PID:16866(db2gds 0)   TID:1   Appid:none
oper system services  sqloEDUCodeTrapHandler Probe:10

ADM0503C  An unexpected internal processing error has occurred.  ALL DB2
PROCESSES ASSOCIATED WITH THIS INSTANCE HAVE BEEN SHUTDOWN.  Diagnostic
information has been recorded.  Contact IBM Support for further assistance.

2007-03-28-01.03.45.437825   Instance:db2inst1   Node:000
PID:16866(db2gds 0)   TID:1   Appid:none
oper system services  sqloEDUCodeTrapHandler Probe:20

Signal number received

0x2FF20E10 : 0x00000006                                 ....

PID:16866 TID:1 Node:000 Title: siginfo_t...

0x2FF210C0 : 0000 0006 0000 0000 0000 0009 0000 0000    ................
0x2FF210D0 : 0000 0000 0000 0000 0000 0000 0000 0000    ................
0x2FF210E0 : 0000 0000 0000 0000 0000 0000 0000 0000    ................
0x2FF210F0 : 0000 0000 0000 0000 0000 0000 0000 0000    ................

2007-03-28-01.03.45.593677   Instance:db2inst1   Node:000
PID:51096(db2agent (LOSHDATA) 0)   TID:1   Appid:*LOCAL.db2inst1.0FD137163009
database utilities  sqlubcka Probe:128   Database:SHDATA

Estimated size of backup in bytes:

0x2FF173B8 : 0x00000004386F5000                         ....8oP.

2007-03-28-01.03.45.801191   Instance:db2inst1   Node:000
PID:51096(db2agent (LOSHDATA) 0)   TID:1   Appid:*LOCAL.db2inst1.0FD137163009
database utilities  sqlubcka Probe:128   Database:SHDATA

Actual size of backup in bytes:

0x2FF173C0 : 0x0000000439005000                         ....9.P.

2007-03-28-01.03.45.917927   Instance:db2inst1   Node:000
PID:51096(db2agent (LOSHDATA) 0)   TID:1   Appid:*LOCAL.db2inst1.0FD137163009
database utilities  sqlubcka Probe:130   Database:SHDATA

Backup Complete.

2007-03-28-01.03.46.341226   Instance:db2inst1   Node:000
PID:21890(db2sysc 0)   TID:1   Appid:none
base sys utilities  sqleChildCrashHandler Probe:15

DiagData
0x10008A4C : 416E 2045 4455 2063 7261 7368 6564 2E      An EDU crashed.

2007-03-28-01.03.46.563302   Instance:db2inst1   Node:000
PID:21890(db2sysc 0)   TID:1   Appid:none
base sys utilities  sqleChildCrashHandler Probe:16

DiagData
0x2FF212C4 : 0x000041E2                                 ..A?

2007-03-28-01.03.46.692058   Instance:db2inst1   Node:000
PID:21890(db2sysc 0)   TID:1   Appid:none
base sys utilities  sqleChildCrashHandler Probe:17

DiagData
0x2FF212C8 : 0x00000101                                 ....

2007-03-28-01.03.46.830061   Instance:db2inst1   Node:000
PID:21890(db2sysc 0)   TID:1   Appid:none
base sys utilities  sqleChildCrashHandler Probe:18

DiagData
0x2FF212CC : 0xFFFFFFFF                                 
作者: itubie    时间: 2007-03-28 19:26
不是这样的

你的锁还是有问题

估计connections也有问题
估计且费时间呢




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2