免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: freebug
打印 上一主题 下一主题

[求助] 有台rx6600总是经常报” TEMPERATURE_HIGH_WARNING”,帮忙看一下哦 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2013-08-01 21:10 |显示全部楼层 |倒序浏览
有台rx6600总是经常报” TEMPERATURE_HIGH_WARNING”,帮忙看一下哦,应该不会是机房的问题,因为同样位置的另一台RX6600就从来没有报过错。


#  Location|Alert| Encoded Field    |  Data Field    |   Keyword / Timestamp
-------------------------------------------------------------------------------
25    BMC      2  0x2051F9FC8F0201E0 625E5781DA010300 TEMPERATURE_NORMAL_FROM_WA
RNING
                                                      01 Aug 2013 06:13:35
24    BMC      2  0x2051F9E7FF0201D0 625E5781D9010300 TEMPERATURE_NORMAL_FROM_WA
RNING
                                                      01 Aug 2013 04:45:51
23    BMC     *3  0x2051F8DE210201C0 62625701DA010300 TEMPERATURE_HIGH_WARNING
                                                      31 Jul 2013 09:51:29
22    BMC     *3  0x2051F8DB4C0201B0 62625701D9010300 TEMPERATURE_HIGH_WARNING
                                                      31 Jul 2013 09:39:24
21    BMC      2  0x2051F8CB180201A0 625E5781DA010300 TEMPERATURE_NORMAL_FROM_WA
RNING
                                                      31 Jul 2013 08:30:16
20    BMC      2  0x2051F554E6020190 625E5781D9010300 TEMPERATURE_NORMAL_FROM_WA
RNING
                                                      28 Jul 2013 17:29:10
19    BMC     *3  0x2051F2B34F020180 62625701DA010300 TEMPERATURE_HIGH_WARNING
                                                      26 Jul 2013 17:35:11

论坛徽章:
0
2 [报告]
发表于 2013-08-03 01:01 |显示全部楼层
PS是正常的:

For System Processor Status see the SS command.
System Power state: On
System Power usage: 658 Watts
Temperature       : Normal


Power supplies                State
-----------------------------------------------------------
Power Supply 0                Normal
Power Supply 1                Normal

Fans                          State
-----------------------------------------------------------
System Fan 1                  Normal
System Fan 2                  Normal
System Fan 3                  Normal
System Fan 4                  Normal
System Fan 5                  Normal
System Fan 6                  Normal


event.log里面有这些报错:

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Thu Aug  1 21:18:14 2013

xxdb sent Event Monitor notification information:

/system/events/ia64_corehw/core_hw is >= 1.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time..........: Thu Aug  1 21:18:14 2013
Severity............: MAJORWARNING
Monitor.............: ia64_corehw
Event #.............: 101011              
System..............: xxdb

Summary:
     System temperature is out of normal range.


Description of Error:

     The system temperature is not within normal operating range. It is higher
     than required operating range.

Probable Cause / Recommended Action:

     Something may be blocking the cooling intakes of the fans. Check for
     obstruction.
     One or more fans may be operating at lower speed than normal. Check the
     fan performance.

     Check for problems with the room air conditioning.

     If the problem is not fixed, the operating temperature may become
     non-recoverable, in which case there are chances that the hardware may be
     damaged.  At that temperature level, on Integrity servers, the firmware
     will shutdown the system automatically. However on HP 9000 servers, the
     action specified in the envd config file will be taken - which may be to
     shutdown the system automatically.

     For information on the sensor that generated this event, refer to FRU ID
     in Event Details section.

Additional Event Data:
     System IP Address...: 137.1.100.12
     Event Id............: 0x51fa601600000000
     Monitor Version.....: B.01.00
     Event Class.........: System
     Client Configuration File...........:
     /var/stm/config/tools/monitor/default_ia64_corehw.clcfg
     Client Configuration File Version...: A.01.00
          Qualification criteria met.
               Number of events..: 1
     Associated OS error log entry id(s):
          None
     Additional System Data:
          System Model Number.............: ia64 hp server rx6600
          EMS Version.....................: A.04.20
          STM Version.....................: C.58.00
          System Serial Number............: SGH49016E9
     Latest information on this event:
          http://docs.hp.com/hpux/content/ ... 4_corehw.htm#101011

v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S    v-v-v-v-v-v-v-v-v-v-v-v-v


Event Details :

     Event Date .............: Thu Aug  1 21:18:13 2013
     Sensor Number ..........: 0xd9
     Sensor Type ............: Temperature
     Sensor Class ...........: Threshold based
     Sensor Reading/Offset...: 0x07 (Offset)
     Event  Type.............: Assertion
     Entity ID ..............: 3
     Generic Message.........:
       Temperature :  Upper non-critical - going high
     Entity FRU Id Info......:
       processor (Sensor ID: Processor 0)



>---------- End Event Monitoring Service Event Notification ----------<

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Thu Aug  1 23:11:43 2013

xxdb sent Event Monitor notification information:

/system/events/ia64_corehw/core_hw is >= 1.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time..........: Thu Aug  1 23:11:43 2013
Severity............: MAJORWARNING
Monitor.............: ia64_corehw
Event #.............: 101011              
System..............: xxdb

Summary:
     System temperature is out of normal range.


Description of Error:

     The system temperature is not within normal operating range. It is higher
     than required operating range.

Probable Cause / Recommended Action:

     Something may be blocking the cooling intakes of the fans. Check for
     obstruction.
     One or more fans may be operating at lower speed than normal. Check the
     fan performance.

     Check for problems with the room air conditioning.

     If the problem is not fixed, the operating temperature may become
     non-recoverable, in which case there are chances that the hardware may be
     damaged.  At that temperature level, on Integrity servers, the firmware
     will shutdown the system automatically. However on HP 9000 servers, the
     action specified in the envd config file will be taken - which may be to
     shutdown the system automatically.

     For information on the sensor that generated this event, refer to FRU ID
     in Event Details section.

Additional Event Data:
     System IP Address...: 137.1.100.12
     Event Id............: 0x51fa7aaf00000000
     Monitor Version.....: B.01.00
     Event Class.........: System
     Client Configuration File...........:
     /var/stm/config/tools/monitor/default_ia64_corehw.clcfg
     Client Configuration File Version...: A.01.00
          Qualification criteria met.
               Number of events..: 1
     Associated OS error log entry id(s):
          None
     Additional System Data:
          System Model Number.............: ia64 hp server rx6600
          EMS Version.....................: A.04.20
          STM Version.....................: C.58.00
          System Serial Number............: SGH49016E9
     Latest information on this event:
          http://docs.hp.com/hpux/content/ ... 4_corehw.htm#101011

v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S    v-v-v-v-v-v-v-v-v-v-v-v-v


Event Details :

     Event Date .............: Thu Aug  1 23:11:34 2013
     Sensor Number ..........: 0xda
     Sensor Type ............: Temperature
     Sensor Class ...........: Threshold based
     Sensor Reading/Offset...: 0x07 (Offset)
     Event  Type.............: Assertion
     Entity ID ..............: 3
     Generic Message.........:
       Temperature :  Upper non-critical - going high
     Entity FRU Id Info......:
       processor (Sensor ID: Processor 1)



>---------- End Event Monitoring Service Event Notification ----------<

论坛徽章:
0
3 [报告]
发表于 2013-08-04 12:51 |显示全部楼层
lbseraph 发表于 2013-08-04 12:43
之前已经说过了~那是CPU报出来的~

http://bbs.chinaunix.net/forum.p ... to=findpost&pti ...

lbseraph 发表于 2013-07-27 10:12
回复 1# freebug

那是CPU的温度告警,超过98度了(100度)。有几个物理CPU?告警提示的是Processor 0和 ...



多谢版大,不过还是很奇怪,我把另外一台正常的RX6600的CPU抽屉跟有问题这台直接交换了,然后另外一台正常的还是正常,这台还是报警。

MP卡里看风扇都运行正常,就是时不时报温度高,我看过CPU抽屉里面灰尘也不多。真是想不通啊。

论坛徽章:
0
4 [报告]
发表于 2013-08-06 00:13 |显示全部楼层
回复 7# lbseraph


    我把调整了一路进风口直接对着这台RX6600观察了三天倒是没有再报错了,另外还想问一下版大,是怎么看出来CPU温度超过100度的啊?是有专业的解码工具么?

论坛徽章:
0
5 [报告]
发表于 2013-08-10 09:56 |显示全部楼层
还是跟温度有关,我把机箱里面的空调进风口调整后,到目前都没再报警了。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP