免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 5532 | 回复: 8
打印 上一主题 下一主题

[故障求助] AIX错误日志,求帮忙 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2013-02-18 10:54 |只看该作者 |倒序浏览
今天查小机错误日志,发现有如下错误,不知道该如何解决,望帮忙
# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3C81E43F   0218085213 P U topsvcs        Late in sending heartbeat
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3C81E43F   0218081213 P U topsvcs        Late in sending heartbeat
96CD8511   0218035213 T S topsvcs        Dead Man Switch will once again be reset
90EDB0A5   0218035213 P S topsvcs        Dead Man Switch being allowed to expire.
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3C81E43F   0218033213 P U topsvcs        Late in sending heartbeat
96CD8511   0218023213 T S topsvcs        Dead Man Switch will once again be reset
90EDB0A5   0218023213 P S topsvcs        Dead Man Switch being allowed to expire.


具体错误如下:
LABEL:          TS_NIM_ERROR_STUCK_
IDENTIFIER:     3D32B80D

Date/Time:       Sun Feb 17 01:12:10 BEIST 2013
Sequence Number: 2222
Machine Id:      00CD7E124C00
Node Id:         metro1_1
Class:           S
Type:            PERM
Resource Name:   topsvcs         

Description
NIM thread blocked

Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.34,7755            
ERROR ID
6BUfAx.erv5F/Uw0.62.e.1...................
REFERENCE CODE
                                          
Thread which was blocked
send thread
Interval in seconds during which process was blocked
          10
Interface name
en0

论坛徽章:
0
2 [报告]
发表于 2013-02-18 14:25 |只看该作者
看样子是由于系统压力比较大
可能是由于cpu,mem忙
或者io忙或者网络延时造成的

3C81E43F   0218033213 P U topsvcs        Late in sending heartbeat
96CD8511   0218023213 T S topsvcs        Dead Man Switch will once again be reset
90EDB0A5   0218023213 P S topsvcs        Dead Man Switch being allowed to expire.

在看看这三个报错信息呢

论坛徽章:
0
3 [报告]
发表于 2013-02-19 11:29 |只看该作者
回复 2# InfoSVC


    LABEL:          TS_LATEHB_PE
IDENTIFIER:     3C81E43F

Date/Time:       Sun Feb 17 10:52:15 BEIST 2013
Sequence Number: 2249
Machine Id:      00CD7E124C00
Node Id:         metro1_1
Class:           U
Type:            PERF
Resource Name:   topsvcs         
Resource Class:  NONE
Resource Type:   NONE
Location:        

Description
Late in sending heartbeat

Probable Causes
Heavy CPU load
Severe physical memory shortage
Heavy I/O activities

Failure Causes
Daemon can not get required system resource

        Recommended Actions
        Reduce the system load

Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.215.1.10,5366               
ERROR ID
6zESUw.TL26F//pI.62.e.1...................
REFERENCE CODE
                                          
A heartbeat is late by the following number of seconds
          14


-----------------------------------------------------

论坛徽章:
0
4
发表于 2013-02-19 11:30
回复 2# InfoSVC

LABEL:          TS_DMS_RESTORED_TE
IDENTIFIER:     96CD8511




Date/Time:       Sun Feb 17 12:32:21 BEIST 2013
Sequence Number: 2265
Machine Id:      00CD7E124C00
Node Id:         metro1_1
Class:           S
Type:            TEMP
Resource Name:   topsvcs         




Description
Dead Man Switch will once again be reset.
Depending on how long the DMS was allowed to expire before recovery
occurred, a TS_DMS_WARNING_ST error may also be seen at this time.




Probable Causes
The conditions which led to an earlier TS_DMS_EXPIRING_EM error
are no longer present on the system.




Failure Causes
The conditions which led to an earlier TS_DMS_EXPIRING_EM error
are no longer present on the system.




        Recommended Actions
        The previous TS_DMS_EXPIRING_EM error should be investigated to
        determine the cause of that problem.




Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.34,4829             
ERROR ID 
6FIMnK0Jp36F/Nf..62.e.1...................
REFERENCE CODE
   

论坛徽章:
0
5 [报告]
发表于 2013-02-19 11:34 |只看该作者
回复 2# InfoSVC

LABEL:          TS_DMS_EXPIRING_EM
IDENTIFIER:     90EDB0A5




Date/Time:       Sun Feb 17 12:32:16 BEIST 2013
Sequence Number: 2263
Machine Id:      00CD7E124C00
Node Id:         metro1_1
Class:           S
Type:            END
Resource Name:   topsvcs         




Description
Dead Man Switch being allowed to expire.
If a TS_DMS_RESTORED_TE error appears after this, that will indicate this
condition has been recovered from.  Otherwise, a DMS-triggered node failure
should be expected to occur after the time indicated in the Detail Data.




Probable Causes
Topology Services has detected blockage that puts it in danger of suffering
a sundered network.  This is due to all viable NIM processes experiencing
blockage, or the daemon's main thread being hung for too long.




User Causes
Excessive I/O load is causing high I/O interrupt traffic
Excessive memory consumption is causing high memory contention




        Recommended Actions
        Reduce application load on the system
        Change (relax) Topology Services tunable parameters
        Call IBM Service if problem persists




Failure Causes
Problem in Operating System prevents processes from running
Excessive I/O interrupt traffic prevents processes from running
Excessive virtual memory activity prevents Topology Services from making progress




        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Change (relax) Topology Services tunable parameters
        Call IBM Service if problem persists




Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.34,4890             
ERROR ID 
6Z0PvE0Ep36F/YS5/62.e.1...................
REFERENCE CODE
                                          
Time remaining until DMS triggers (in msec)
        5831
DMS trigger interval (in msec)
       20000
   

论坛徽章:
0
6 [报告]
发表于 2013-02-19 12:57 |只看该作者
感觉en0这个网卡被阻塞了
en0是不是跑的是心跳啊

论坛徽章:
0
7 [报告]
发表于 2013-02-19 14:40 |只看该作者
回复 6# InfoSVC


    心跳是用的rs232的串口

论坛徽章:
0
8 [报告]
发表于 2013-02-19 16:33 |只看该作者
看看系统目前的进程数,和内存使用情况

论坛徽章:
0
9 [报告]
发表于 2013-02-20 09:18 |只看该作者
跑个nmon吧
看看报错的时候,系统资源如何
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP