请教checkpoint stalled是的原因
昨天生产数据库遇到一个问题,请大家帮忙查查原因:informix数据库一直在checkpoint ,所有交易被堵塞,查逻辑日志,已备份。数据库版本为10.0,建立了HDR,HDR看起来也正常。 重启数据库后恢复正常。但不知原因,担心再次发生同样的事故,请大家指教。 重启数据库时显示:WARING:Checkpoint appears stalled and may not complete before the database server shuts down 日志都不愿意贴一个~:em26: :em17:多贴一段日志吧 对不起啊,我现在把online.log贴上:
17:40:10Checkpoint loguniq 17748, logpos 0x9d9300, timestamp: 0x458d16dc
17:40:10Maximum server connections 48
17:41:25DR: Sending log 17748, size 5120 pages, 50.61 percent used
17:41:25Logical Log 17748 Complete, timestamp: 0x458d1f99.
17:41:27Logical Log 17748 - Backup Started
17:41:27Logical Log 17748 - Backup Completed
17:43:40Checkpoint Completed:duration was 0 seconds.
17:43:40Checkpoint loguniq 17749, logpos 0xc48d0, timestamp: 0x458d4d44
17:43:40Maximum server connections 48
17:47:11Checkpoint Completed:duration was 0 seconds.
17:47:11Checkpoint loguniq 17749, logpos 0x1ec018, timestamp: 0x458d9ff8
17:47:11Maximum server connections 48
17:50:41Checkpoint Completed:duration was 0 seconds.
17:50:41Checkpoint loguniq 17749, logpos 0x318dd0, timestamp: 0x458dfcbb
17:50:41Maximum server connections 48
17:54:11Checkpoint Completed:duration was 0 seconds.
17:54:11Checkpoint loguniq 17749, logpos 0x45ef6c, timestamp: 0x458e57b6
17:54:11Maximum server connections 48
17:57:41Checkpoint Completed:duration was 0 seconds.
17:57:41Checkpoint loguniq 17749, logpos 0x546018, timestamp: 0x458e8b70
17:57:41Maximum server connections 48
17:58:53DR: Sending log 17749 (current), size 5120 pages, 28.50 percent used
此时已堵塞,下面是重启数据库
18:09:30WARNING: Checkpoint appears stalled and may not complete
before the database server shuts down.
18:09:31IBM Informix Dynamic Server Stopped.
18:09:45IBM Informix Dynamic Server Started.
18:09:45WARNING: If you intend to use J/Foundation or GLS for Unicode feature(GLU) with this Server instance, please make sure that your SHMBASE value specifies in onconfig is 0x700000010000000 or above. Otherwise you will have problems while attaching or dynamimically adding virtual shared memory segments. Please refer to Server machine notes for more information.
Tue Mar2 18:09:47 2010
18:09:47Warning: ONCONFIG dump directory (DUMPDIR) '/tmp' has insecure permissions
18:09:47Event alarms enabled.ALARMPROG = '/usr/informix/etc/alarmprogram.sh'
18:09:47Dynamically allocated new virtual shared memory segment (size 512000KB)
18:09:47Memory sizes:resident:9288208 KB, virtual:528000 KB, no SHMTOTAL limit
18:09:47Booting Language <c> from module <>
18:09:47Loading Module <CNULL>
18:09:47Booting Language <builtin> from module <>
18:09:47Loading Module <BUILTINNULL>
18:09:52DR: DRAUTO is 0 (Off)
18:09:52AIX MP latch code enabled
18:09:52IBM Informix Dynamic Server Version 10.00.FC8 Software Serial Number AAA#B000000
18:09:53IBM Informix Dynamic Server Initialized -- Shared Memory Initialized.
18:09:53DR: Reservation of the last logical log for log backup turned on
18:09:53DR: Trying to connect to secondary server = db_s
18:09:57(7) connection rejected - no calls allowed for sqlexec
18:09:57(13) connection rejected - no calls allowed for sqlexec
18:10:57DR: Cannot connect to secondary server
18:10:57DR: Turned off on primary server
18:10:57DR: Cannot connect to secondary server
18:10:58Physical Recovery Started at Page (3:222476).
18:10:59Physical Recovery Complete: 317 Pages Examined, 317 Pages Restored.
18:10:59Logical Recovery Started.
18:10:5910 recovery worker threads will be started.
18:11:04Logical Recovery has reached the transaction cleanup phase.
18:11:04Logical Recovery Complete.
141 Committed, 0 Rolled Back, 0 Open, 0 Bad Locks
18:11:05Onconfig parameter TAPEDEV modified from pzh520:/backup/data/bakdat to /dev/rmt0.
18:11:05Dataskip is now OFF for all dbspaces
18:11:05Checkpoint Completed:duration was 0 seconds.
18:11:05Checkpoint loguniq 17749, logpos 0x5b4018, timestamp: 0x877a488e
18:11:05Maximum server connections 0
18:11:05On-Line Mode
18:11:05Affinitied VP 1 to phys proc 2
18:11:05Affinitied VP 5 to phys proc 5
18:11:05Affinitied VP 4 to phys proc 4
18:11:05Affinitied VP 8 to phys proc 8
18:11:05Affinitied VP 6 to phys proc 6
18:11:05Affinitied VP 3 to phys proc 3
18:11:05Affinitied VP 7 to phys proc 7
18:11:34Checkpoint Completed:duration was 0 seconds.
18:11:34Checkpoint loguniq 17749, logpos 0x5dc018, timestamp: 0x877a5881
18:11:34Maximum server connections 5
18:11:35DR: Primary server connected
18:11:35DR: Secondary server needs failure recovery
18:11:38DR: Sending log 17749, size 5120 pages, 29.47 percent used
18:11:38Logical Log 17749 Complete, timestamp: 0x877a5a1f.
18:14:22DR: Sending log 17750 (current), size 5120 pages, 6.93 percent used
18:14:26DR: Sending Logical Logs Completed
18:14:27DR: Primary server operational
18:14:27Checkpoint Completed:duration was 0 seconds.
18:14:27Checkpoint loguniq 17750, logpos 0x166018, timestamp: 0x877af03c
18:14:27Maximum server connections 6
18:18:04Checkpoint Completed:duration was 0 seconds.
18:18:04Checkpoint loguniq 17750, logpos 0x2ce018, timestamp: 0x877b42a1
18:18:04Maximum server connections 6
18:19:53Logical Log 17749 - Backup Started
18:19:53Logical Log 17749 - Backup Completed
18:21:35Checkpoint Completed:duration was 0 seconds.
18:21:35Checkpoint loguniq 17750, logpos 0x3db018, timestamp: 0x877b8cbd
18:21:35Maximum server connections 6 本帖最后由 liaosnet 于 2010-03-04 00:22 编辑
回复 5# yhrain3196
你这是HDR系统,请把HDR两边的日志一起帖上来~~
从你给的日志上看不太出什么原因.
在HDR系统中,基本流程是LOGBUF(逻辑日志缓存)->主机HDR发送缓冲区->备机HDR接收缓冲区->逻辑日志->逻辑恢复.
可能出现问题的较多的出现在主机HDR发送缓冲区->备机HDR接收缓冲区,这里可能因为网络原因导致不能准确及时发送;或是缓存太小,系统需多次发送导致的延时; 备机HDR接收缓冲区->逻辑日志,这里若是往逻辑日志里写不能及时处理导致后续的信息无法处理(比如,备机的逻辑日志没有备份/不可用).. 谢谢了,我也怀疑是HDR的原因,但在两边的日志上没看出问题,我把备份那边的日志也贴出来,看看能否发现问题:
17:51:06Maximum server connections 4
17:57:36Checkpoint Completed:duration was 1 seconds.
17:57:36Checkpoint loguniq 17748, logpos 0x717018, timestamp: 0x458ca083
17:57:36Maximum server connections 4
17:58:07Checkpoint Completed:duration was 0 seconds.
17:58:07Checkpoint loguniq 17748, logpos 0x873018, timestamp: 0x458cf881
17:58:07Maximum server connections 4
17:58:36Checkpoint Completed:duration was 0 seconds.
17:58:36Checkpoint loguniq 17748, logpos 0x9d9300, timestamp: 0x458d2d1b
17:58:36Maximum server connections 4
18:01:33Logical Log 17748 Complete, timestamp: 0x458d7cfa.
18:01:34Checkpoint Completed:duration was 0 seconds.
18:01:34Checkpoint loguniq 17749, logpos 0xc48d0, timestamp: 0x458d8d3c
18:01:34Maximum server connections 4
18:08:07Checkpoint Completed:duration was 1 seconds.
18:08:07Checkpoint loguniq 17749, logpos 0x1ec018, timestamp: 0x458dfb6a
18:08:07Maximum server connections 4
18:09:06Checkpoint Completed:duration was 1 seconds.
18:09:06Checkpoint loguniq 17749, logpos 0x318dd0, timestamp: 0x458e5fd2
18:09:06Maximum server connections 4
18:09:07Checkpoint Completed:duration was 0 seconds.
18:09:07Checkpoint loguniq 17749, logpos 0x45ef6c, timestamp: 0x458e7c7f
18:09:07Maximum server connections 4
此时发现堵塞,下面是重启数据库
18:09:28DR: Receive error
18:09:28ASF Echo-Thread Server: asfcode = -25582: oserr = 4: errstr = : Network connection is broken.
System error = 4.
18:09:28DR: Failure recovery error (6)
18:09:28DR: Turned off on secondary server
18:11:15DR: Secondary server connected
18:11:17DR: Secondary server needs failure recovery
18:11:18DR: Failure recovery from disk in progress ...
18:13:58Logical Log 17749 Complete, timestamp: 0x458e8037.
18:14:05Checkpoint Completed:duration was 0 seconds.
18:14:05Checkpoint loguniq 17749, logpos 0x546018, timestamp: 0x877a66f0
18:14:05Maximum server connections 4
18:14:06Checkpoint Completed:duration was 1 seconds.
18:14:06Checkpoint loguniq 17749, logpos 0x5b4018, timestamp: 0x877a6e9e
18:14:06Maximum server connections 4
18:14:06Checkpoint Completed:duration was 0 seconds.
18:14:06Checkpoint loguniq 17749, logpos 0x5dc018, timestamp: 0x877a7285
18:14:06Maximum server connections 4
18:14:06Logical Log 17749 Complete, timestamp: 0x877aefce.
18:14:08DR: Secondary server operational
18:14:09Checkpoint Completed:duration was 0 seconds.
18:14:09Checkpoint loguniq 17750, logpos 0x166018, timestamp: 0x877b1240
18:14:09Maximum server connections 4
18:17:46Checkpoint Completed:duration was 0 seconds.
18:17:46Checkpoint loguniq 17750, logpos 0x2ce018, timestamp: 0x877b4300
18:17:46Maximum server connections 4
18:21:17Checkpoint Completed:duration was 1 seconds.
18:09:07Maximum server connections 4
此时发现堵塞,下面是重启数据库
18:09:28DR: Receive error
18:09:28ASF Echo-Thread Server: asfcode = -25582: oserr = 4: errstr = : Network connection is broken.
System error = 4.
18:09:28DR: Failure recovery error (6)
18:09:28DR: Turned off on secondary server
18:11:15DR: Secondary server connected
18:11:17DR: Secondary server needs failure recovery
18:11:18DR: Failure recovery from disk in progress ...
HDR已经断掉了~
你把两个机子的日志都打包弄上来吧..需不要只是一小段的.. 谢谢回复! 重启数据库时,HDR是断了,但数据库启动后,HDR又自动连接上了
页:
[1]