免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4605 | 回复: 8
打印 上一主题 下一主题

请教checkpoint stalled是的原因 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2010-03-03 09:19 |只看该作者 |倒序浏览
昨天生产数据库遇到一个问题,请大家帮忙查查原因:informix数据库一直在checkpoint ,所有交易被堵塞,查逻辑日志,已备份。数据库版本为10.0,建立了HDR,HDR看起来也正常。 重启数据库后恢复正常。但不知原因,担心再次发生同样的事故,请大家指教。

论坛徽章:
0
2 [报告]
发表于 2010-03-03 09:30 |只看该作者
重启数据库时显示:WARING:Checkpoint appears stalled and may not complete before the database server shuts down

论坛徽章:
11
金牛座
日期:2015-03-19 16:56:22数据库技术版块每日发帖之星
日期:2016-08-02 06:20:00数据库技术版块每日发帖之星
日期:2016-04-24 06:20:00数据库技术版块每日发帖之星
日期:2016-04-13 06:20:00IT运维版块每日发帖之星
日期:2016-04-13 06:20:00数据库技术版块每日发帖之星
日期:2016-02-03 06:20:00数据库技术版块每日发帖之星
日期:2015-08-06 06:20:00季节之章:春
日期:2015-03-27 15:54:57羊年新春福章
日期:2015-03-27 15:54:37戌狗
日期:2015-03-19 16:56:41数据库技术版块每日发帖之星
日期:2016-08-18 06:20:00
3 [报告]
发表于 2010-03-03 09:54 |只看该作者
日志都不愿意贴一个~

论坛徽章:
0
4 [报告]
发表于 2010-03-03 14:49 |只看该作者

多贴一段日志吧

论坛徽章:
0
5 [报告]
发表于 2010-03-03 18:44 |只看该作者
对不起啊,我现在把online.log贴上:
17:40:10  Checkpoint loguniq 17748, logpos 0x9d9300, timestamp: 0x458d16dc

17:40:10  Maximum server connections 48
17:41:25  DR: Sending log 17748, size 5120 pages, 50.61 percent used
17:41:25  Logical Log 17748 Complete, timestamp: 0x458d1f99.
17:41:27  Logical Log 17748 - Backup Started
17:41:27  Logical Log 17748 - Backup Completed
17:43:40  Checkpoint Completed:  duration was 0 seconds.
17:43:40  Checkpoint loguniq 17749, logpos 0xc48d0, timestamp: 0x458d4d44

17:43:40  Maximum server connections 48
17:47:11  Checkpoint Completed:  duration was 0 seconds.
17:47:11  Checkpoint loguniq 17749, logpos 0x1ec018, timestamp: 0x458d9ff8

17:47:11  Maximum server connections 48
17:50:41  Checkpoint Completed:  duration was 0 seconds.
17:50:41  Checkpoint loguniq 17749, logpos 0x318dd0, timestamp: 0x458dfcbb

17:50:41  Maximum server connections 48
17:54:11  Checkpoint Completed:  duration was 0 seconds.
17:54:11  Checkpoint loguniq 17749, logpos 0x45ef6c, timestamp: 0x458e57b6

17:54:11  Maximum server connections 48
17:57:41  Checkpoint Completed:  duration was 0 seconds.
17:57:41  Checkpoint loguniq 17749, logpos 0x546018, timestamp: 0x458e8b70

17:57:41  Maximum server connections 48
17:58:53  DR: Sending log 17749 (current), size 5120 pages, 28.50 percent used
  此时已堵塞,下面是重启数据库
18:09:30  WARNING: Checkpoint appears stalled and may not complete
          before the database server shuts down.

18:09:31  IBM Informix Dynamic Server Stopped.

18:09:45  IBM Informix Dynamic Server Started.
18:09:45  WARNING: If you intend to use J/Foundation or GLS for Unicode feature(GLU) with this Server instance, please make sure that your SHMBASE value specifies in onconfig is 0x700000010000000 or above. Otherwise you will have problems while attaching or dynamimically adding virtual shared memory segments. Please refer to Server machine notes for more information.


Tue Mar  2 18:09:47 2010

18:09:47  Warning: ONCONFIG dump directory (DUMPDIR) '/tmp' has insecure permissions
18:09:47  Event alarms enabled.  ALARMPROG = '/usr/informix/etc/alarmprogram.sh'
18:09:47  Dynamically allocated new virtual shared memory segment (size 512000KB)
18:09:47  Memory sizes:resident:9288208 KB, virtual:528000 KB, no SHMTOTAL limit
18:09:47  Booting Language <c> from module <>
18:09:47  Loading Module <CNULL>
18:09:47  Booting Language <builtin> from module <>
18:09:47  Loading Module <BUILTINNULL>
18:09:52  DR: DRAUTO is 0 (Off)
18:09:52  AIX MP latch code enabled
18:09:52  IBM Informix Dynamic Server Version 10.00.FC8     Software Serial Number AAA#B000000
18:09:53  IBM Informix Dynamic Server Initialized -- Shared Memory Initialized.

18:09:53  DR: Reservation of the last logical log for log backup turned on
18:09:53  DR: Trying to connect to secondary server = db_s
18:09:57  (7) connection rejected - no calls allowed for sqlexec
18:09:57  (13) connection rejected - no calls allowed for sqlexec
18:10:57  DR: Cannot connect to secondary server
18:10:57  DR: Turned off on primary server
18:10:57  DR: Cannot connect to secondary server
18:10:58  Physical Recovery Started at Page (3:222476).
18:10:59  Physical Recovery Complete: 317 Pages Examined, 317 Pages Restored.
18:10:59  Logical Recovery Started.
18:10:59  10 recovery worker threads will be started.
18:11:04  Logical Recovery has reached the transaction cleanup phase.
18:11:04  Logical Recovery Complete.
          141 Committed, 0 Rolled Back, 0 Open, 0 Bad Locks

18:11:05  Onconfig parameter TAPEDEV modified from pzh520:/backup/data/bakdat to /dev/rmt0.
18:11:05  Dataskip is now OFF for all dbspaces
18:11:05  Checkpoint Completed:  duration was 0 seconds.
18:11:05  Checkpoint loguniq 17749, logpos 0x5b4018, timestamp: 0x877a488e

18:11:05  Maximum server connections 0
18:11:05  On-Line Mode
18:11:05  Affinitied VP 1 to phys proc 2
18:11:05  Affinitied VP 5 to phys proc 5
18:11:05  Affinitied VP 4 to phys proc 4
18:11:05  Affinitied VP 8 to phys proc 8
18:11:05  Affinitied VP 6 to phys proc 6
18:11:05  Affinitied VP 3 to phys proc 3
18:11:05  Affinitied VP 7 to phys proc 7
18:11:34  Checkpoint Completed:  duration was 0 seconds.
18:11:34  Checkpoint loguniq 17749, logpos 0x5dc018, timestamp: 0x877a5881

18:11:34  Maximum server connections 5
18:11:35  DR: Primary server connected
18:11:35  DR: Secondary server needs failure recovery

18:11:38  DR: Sending log 17749, size 5120 pages, 29.47 percent used
18:11:38  Logical Log 17749 Complete, timestamp: 0x877a5a1f.
18:14:22  DR: Sending log 17750 (current), size 5120 pages, 6.93 percent used
18:14:26  DR: Sending Logical Logs Completed
18:14:27  DR: Primary server operational
18:14:27  Checkpoint Completed:  duration was 0 seconds.
18:14:27  Checkpoint loguniq 17750, logpos 0x166018, timestamp: 0x877af03c

18:14:27  Maximum server connections 6
18:18:04  Checkpoint Completed:  duration was 0 seconds.
18:18:04  Checkpoint loguniq 17750, logpos 0x2ce018, timestamp: 0x877b42a1

18:18:04  Maximum server connections 6
18:19:53  Logical Log 17749 - Backup Started
18:19:53  Logical Log 17749 - Backup Completed
18:21:35  Checkpoint Completed:  duration was 0 seconds.
18:21:35  Checkpoint loguniq 17750, logpos 0x3db018, timestamp: 0x877b8cbd

18:21:35  Maximum server connections 6

论坛徽章:
11
金牛座
日期:2015-03-19 16:56:22数据库技术版块每日发帖之星
日期:2016-08-02 06:20:00数据库技术版块每日发帖之星
日期:2016-04-24 06:20:00数据库技术版块每日发帖之星
日期:2016-04-13 06:20:00IT运维版块每日发帖之星
日期:2016-04-13 06:20:00数据库技术版块每日发帖之星
日期:2016-02-03 06:20:00数据库技术版块每日发帖之星
日期:2015-08-06 06:20:00季节之章:春
日期:2015-03-27 15:54:57羊年新春福章
日期:2015-03-27 15:54:37戌狗
日期:2015-03-19 16:56:41数据库技术版块每日发帖之星
日期:2016-08-18 06:20:00
6 [报告]
发表于 2010-03-04 00:10 |只看该作者
本帖最后由 liaosnet 于 2010-03-04 00:22 编辑

回复 5# yhrain3196


    你这是HDR系统,请把HDR两边的日志一起帖上来~~

    从你给的日志上看不太出什么原因.
    在HDR系统中,基本流程是LOGBUF(逻辑日志缓存)->主机HDR发送缓冲区->备机HDR接收缓冲区->逻辑日志->逻辑恢复.
    可能出现问题的较多的出现在主机HDR发送缓冲区->备机HDR接收缓冲区,这里可能因为网络原因导致不能准确及时发送;或是缓存太小,系统需多次发送导致的延时; 备机HDR接收缓冲区->逻辑日志,这里若是往逻辑日志里写不能及时处理导致后续的信息无法处理(比如,备机的逻辑日志没有备份/不可用)..

论坛徽章:
0
7 [报告]
发表于 2010-03-04 10:44 |只看该作者
谢谢了,我也怀疑是HDR的原因,但在两边的日志上没看出问题,我把备份那边的日志也贴出来,看看能否发现问题:

17:51:06  Maximum server connections 4
17:57:36  Checkpoint Completed:  duration was 1 seconds.
17:57:36  Checkpoint loguniq 17748, logpos 0x717018, timestamp: 0x458ca083

17:57:36  Maximum server connections 4
17:58:07  Checkpoint Completed:  duration was 0 seconds.
17:58:07  Checkpoint loguniq 17748, logpos 0x873018, timestamp: 0x458cf881

17:58:07  Maximum server connections 4
17:58:36  Checkpoint Completed:  duration was 0 seconds.
17:58:36  Checkpoint loguniq 17748, logpos 0x9d9300, timestamp: 0x458d2d1b

17:58:36  Maximum server connections 4
18:01:33  Logical Log 17748 Complete, timestamp: 0x458d7cfa.
18:01:34  Checkpoint Completed:  duration was 0 seconds.
18:01:34  Checkpoint loguniq 17749, logpos 0xc48d0, timestamp: 0x458d8d3c

18:01:34  Maximum server connections 4
18:08:07  Checkpoint Completed:  duration was 1 seconds.
18:08:07  Checkpoint loguniq 17749, logpos 0x1ec018, timestamp: 0x458dfb6a

18:08:07  Maximum server connections 4
18:09:06  Checkpoint Completed:  duration was 1 seconds.
18:09:06  Checkpoint loguniq 17749, logpos 0x318dd0, timestamp: 0x458e5fd2

18:09:06  Maximum server connections 4
18:09:07  Checkpoint Completed:  duration was 0 seconds.
18:09:07  Checkpoint loguniq 17749, logpos 0x45ef6c, timestamp: 0x458e7c7f

18:09:07  Maximum server connections 4
此时发现堵塞,下面是重启数据库
18:09:28  DR: Receive error
18:09:28  ASF Echo-Thread Server: asfcode = -25582: oserr = 4: errstr = : Network connection is broken.
System error = 4.
18:09:28  DR: Failure recovery error (6)
18:09:28  DR: Turned off on secondary server
18:11:15  DR: Secondary server connected
18:11:17  DR: Secondary server needs failure recovery

18:11:18  DR: Failure recovery from disk in progress ...
18:13:58  Logical Log 17749 Complete, timestamp: 0x458e8037.
18:14:05  Checkpoint Completed:  duration was 0 seconds.
18:14:05  Checkpoint loguniq 17749, logpos 0x546018, timestamp: 0x877a66f0

18:14:05  Maximum server connections 4
18:14:06  Checkpoint Completed:  duration was 1 seconds.
18:14:06  Checkpoint loguniq 17749, logpos 0x5b4018, timestamp: 0x877a6e9e

18:14:06  Maximum server connections 4
18:14:06  Checkpoint Completed:  duration was 0 seconds.
18:14:06  Checkpoint loguniq 17749, logpos 0x5dc018, timestamp: 0x877a7285

18:14:06  Maximum server connections 4
18:14:06  Logical Log 17749 Complete, timestamp: 0x877aefce.
18:14:08  DR: Secondary server operational
18:14:09  Checkpoint Completed:  duration was 0 seconds.
18:14:09  Checkpoint loguniq 17750, logpos 0x166018, timestamp: 0x877b1240

18:14:09  Maximum server connections 4
18:17:46  Checkpoint Completed:  duration was 0 seconds.
18:17:46  Checkpoint loguniq 17750, logpos 0x2ce018, timestamp: 0x877b4300

18:17:46  Maximum server connections 4
18:21:17  Checkpoint Completed:  duration was 1 seconds.

论坛徽章:
11
金牛座
日期:2015-03-19 16:56:22数据库技术版块每日发帖之星
日期:2016-08-02 06:20:00数据库技术版块每日发帖之星
日期:2016-04-24 06:20:00数据库技术版块每日发帖之星
日期:2016-04-13 06:20:00IT运维版块每日发帖之星
日期:2016-04-13 06:20:00数据库技术版块每日发帖之星
日期:2016-02-03 06:20:00数据库技术版块每日发帖之星
日期:2015-08-06 06:20:00季节之章:春
日期:2015-03-27 15:54:57羊年新春福章
日期:2015-03-27 15:54:37戌狗
日期:2015-03-19 16:56:41数据库技术版块每日发帖之星
日期:2016-08-18 06:20:00
8 [报告]
发表于 2010-03-04 12:12 |只看该作者
18:09:07  Maximum server connections 4
此时发现堵塞,下面是重启数据库
18:09:28  DR: Receive error
18:09:28  ASF Echo-Thread Server: asfcode = -25582: oserr = 4: errstr = : Network connection is broken.
System error = 4.
18:09:28  DR: Failure recovery error (6)
18:09:28  DR: Turned off on secondary server
18:11:15  DR: Secondary server connected
18:11:17  DR: Secondary server needs failure recovery

18:11:18  DR: Failure recovery from disk in progress ...


HDR已经断掉了~
你把两个机子的日志都打包弄上来吧..需不要只是一小段的..

论坛徽章:
0
9 [报告]
发表于 2010-03-05 09:42 |只看该作者
谢谢回复! 重启数据库时,HDR是断了,但数据库启动后,HDR又自动连接上了
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP