Chinaunix

标题: AIX错误日志,求帮忙 [打印本页]

作者: lianshi    时间: 2013-02-18 10:54
标题: AIX错误日志,求帮忙
今天查小机错误日志,发现有如下错误,不知道该如何解决,望帮忙
# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3D32B80D   0218085213 P S topsvcs        NIM thread blocked
3C81E43F   0218085213 P U topsvcs        Late in sending heartbeat
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3D32B80D   0218081213 P S topsvcs        NIM thread blocked
3C81E43F   0218081213 P U topsvcs        Late in sending heartbeat
96CD8511   0218035213 T S topsvcs        Dead Man Switch will once again be reset
90EDB0A5   0218035213 P S topsvcs        Dead Man Switch being allowed to expire.
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3D32B80D   0218033213 P S topsvcs        NIM thread blocked
3C81E43F   0218033213 P U topsvcs        Late in sending heartbeat
96CD8511   0218023213 T S topsvcs        Dead Man Switch will once again be reset
90EDB0A5   0218023213 P S topsvcs        Dead Man Switch being allowed to expire.


具体错误如下:
LABEL:          TS_NIM_ERROR_STUCK_
IDENTIFIER:     3D32B80D

Date/Time:       Sun Feb 17 01:12:10 BEIST 2013
Sequence Number: 2222
Machine Id:      00CD7E124C00
Node Id:         metro1_1
Class:           S
Type:            PERM
Resource Name:   topsvcs         

Description
NIM thread blocked

Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O

        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Tune virtual memory parameters
        Call IBM Service if problem persists

Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.34,7755            
ERROR ID
6BUfAx.erv5F/Uw0.62.e.1...................
REFERENCE CODE
                                          
Thread which was blocked
send thread
Interval in seconds during which process was blocked
          10
Interface name
en0
作者: InfoSVC    时间: 2013-02-18 14:25
看样子是由于系统压力比较大
可能是由于cpu,mem忙
或者io忙或者网络延时造成的

3C81E43F   0218033213 P U topsvcs        Late in sending heartbeat
96CD8511   0218023213 T S topsvcs        Dead Man Switch will once again be reset
90EDB0A5   0218023213 P S topsvcs        Dead Man Switch being allowed to expire.

在看看这三个报错信息呢
作者: lianshi    时间: 2013-02-19 11:29
回复 2# InfoSVC


    LABEL:          TS_LATEHB_PE
IDENTIFIER:     3C81E43F

Date/Time:       Sun Feb 17 10:52:15 BEIST 2013
Sequence Number: 2249
Machine Id:      00CD7E124C00
Node Id:         metro1_1
Class:           U
Type:            PERF
Resource Name:   topsvcs         
Resource Class:  NONE
Resource Type:   NONE
Location:        

Description
Late in sending heartbeat

Probable Causes
Heavy CPU load
Severe physical memory shortage
Heavy I/O activities

Failure Causes
Daemon can not get required system resource

        Recommended Actions
        Reduce the system load

Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.215.1.10,5366               
ERROR ID
6zESUw.TL26F//pI.62.e.1...................
REFERENCE CODE
                                          
A heartbeat is late by the following number of seconds
          14


-----------------------------------------------------

作者: lianshi    时间: 2013-02-19 11:34
回复 2# InfoSVC

LABEL:          TS_DMS_EXPIRING_EM
IDENTIFIER:     90EDB0A5




Date/Time:       Sun Feb 17 12:32:16 BEIST 2013
Sequence Number: 2263
Machine Id:      00CD7E124C00
Node Id:         metro1_1
Class:           S
Type:            END
Resource Name:   topsvcs         




Description
Dead Man Switch being allowed to expire.
If a TS_DMS_RESTORED_TE error appears after this, that will indicate this
condition has been recovered from.  Otherwise, a DMS-triggered node failure
should be expected to occur after the time indicated in the Detail Data.




Probable Causes
Topology Services has detected blockage that puts it in danger of suffering
a sundered network.  This is due to all viable NIM processes experiencing
blockage, or the daemon's main thread being hung for too long.




User Causes
Excessive I/O load is causing high I/O interrupt traffic
Excessive memory consumption is causing high memory contention




        Recommended Actions
        Reduce application load on the system
        Change (relax) Topology Services tunable parameters
        Call IBM Service if problem persists




Failure Causes
Problem in Operating System prevents processes from running
Excessive I/O interrupt traffic prevents processes from running
Excessive virtual memory activity prevents Topology Services from making progress




        Recommended Actions
        Examine I/O and memory activity on the system
        Reduce load on the system
        Change (relax) Topology Services tunable parameters
        Call IBM Service if problem persists




Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.34,4890             
ERROR ID 
6Z0PvE0Ep36F/YS5/62.e.1...................
REFERENCE CODE
                                          
Time remaining until DMS triggers (in msec)
        5831
DMS trigger interval (in msec)
       20000
   
作者: InfoSVC    时间: 2013-02-19 12:57
感觉en0这个网卡被阻塞了
en0是不是跑的是心跳啊
作者: lianshi    时间: 2013-02-19 14:40
回复 6# InfoSVC


    心跳是用的rs232的串口
作者: hello_unix    时间: 2013-02-19 16:33
看看系统目前的进程数,和内存使用情况
作者: InfoSVC    时间: 2013-02-20 09:18
跑个nmon吧
看看报错的时候,系统资源如何




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2