有台rx6600总是经常报” TEMPERATURE_HIGH_WARNING”,帮忙看一下哦
有台rx6600总是经常报” TEMPERATURE_HIGH_WARNING”,帮忙看一下哦,应该不会是机房的问题,因为同样位置的另一台RX6600就从来没有报过错。#Location|Alert| Encoded Field |Data Field | Keyword / Timestamp
-------------------------------------------------------------------------------
25 BMC 20x2051F9FC8F0201E0 625E5781DA010300 TEMPERATURE_NORMAL_FROM_WA
RNING
01 Aug 2013 06:13:35
24 BMC 20x2051F9E7FF0201D0 625E5781D9010300 TEMPERATURE_NORMAL_FROM_WA
RNING
01 Aug 2013 04:45:51
23 BMC *30x2051F8DE210201C0 62625701DA010300 TEMPERATURE_HIGH_WARNING
31 Jul 2013 09:51:29
22 BMC *30x2051F8DB4C0201B0 62625701D9010300 TEMPERATURE_HIGH_WARNING
31 Jul 2013 09:39:24
21 BMC 20x2051F8CB180201A0 625E5781DA010300 TEMPERATURE_NORMAL_FROM_WA
RNING
31 Jul 2013 08:30:16
20 BMC 20x2051F554E6020190 625E5781D9010300 TEMPERATURE_NORMAL_FROM_WA
RNING
28 Jul 2013 17:29:10
19 BMC *30x2051F2B34F020180 62625701DA010300 TEMPERATURE_HIGH_WARNING
26 Jul 2013 17:35:11
看看风扇,以及空气流通状况。
MP LOG:
============
cm > ps
SYSTEM LOG:
============
event.log
通风不好,检查滤网 PS是正常的:
For System Processor Status see the SS command.
System Power state: On
System Power usage: 658 Watts
Temperature : Normal
Power supplies State
-----------------------------------------------------------
Power Supply 0 Normal
Power Supply 1 Normal
Fans State
-----------------------------------------------------------
System Fan 1 Normal
System Fan 2 Normal
System Fan 3 Normal
System Fan 4 Normal
System Fan 5 Normal
System Fan 6 Normal
event.log里面有这些报错:
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Thu Aug1 21:18:14 2013
xxdb sent Event Monitor notification information:
/system/events/ia64_corehw/core_hw is >= 1.
Its current value is MAJORWARNING(3).
Event data from monitor:
Event Time..........: Thu Aug1 21:18:14 2013
Severity............: MAJORWARNING
Monitor.............: ia64_corehw
Event #.............: 101011
System..............: xxdb
Summary:
System temperature is out of normal range.
Description of Error:
The system temperature is not within normal operating range. It is higher
than required operating range.
Probable Cause / Recommended Action:
Something may be blocking the cooling intakes of the fans. Check for
obstruction.
One or more fans may be operating at lower speed than normal. Check the
fan performance.
Check for problems with the room air conditioning.
If the problem is not fixed, the operating temperature may become
non-recoverable, in which case there are chances that the hardware may be
damaged.At that temperature level, on Integrity servers, the firmware
will shutdown the system automatically. However on HP 9000 servers, the
action specified in the envd config file will be taken - which may be to
shutdown the system automatically.
For information on the sensor that generated this event, refer to FRU ID
in Event Details section.
Additional Event Data:
System IP Address...: 137.1.100.12
Event Id............: 0x51fa601600000000
Monitor Version.....: B.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_ia64_corehw.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: ia64 hp server rx6600
EMS Version.....................: A.04.20
STM Version.....................: C.58.00
System Serial Number............: SGH49016E9
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/ia64_corehw.htm#101011
v-v-v-v-v-v-v-v-v-v-v-v-v DETAILS v-v-v-v-v-v-v-v-v-v-v-v-v
Event Details :
Event Date .............: Thu Aug1 21:18:13 2013
Sensor Number ..........: 0xd9
Sensor Type ............: Temperature
Sensor Class ...........: Threshold based
Sensor Reading/Offset...: 0x07 (Offset)
EventType.............: Assertion
Entity ID ..............: 3
Generic Message.........:
Temperature :Upper non-critical - going high
Entity FRU Id Info......:
processor (Sensor ID: Processor 0)
>---------- End Event Monitoring Service Event Notification ----------<
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Thu Aug1 23:11:43 2013
xxdb sent Event Monitor notification information:
/system/events/ia64_corehw/core_hw is >= 1.
Its current value is MAJORWARNING(3).
Event data from monitor:
Event Time..........: Thu Aug1 23:11:43 2013
Severity............: MAJORWARNING
Monitor.............: ia64_corehw
Event #.............: 101011
System..............: xxdb
Summary:
System temperature is out of normal range.
Description of Error:
The system temperature is not within normal operating range. It is higher
than required operating range.
Probable Cause / Recommended Action:
Something may be blocking the cooling intakes of the fans. Check for
obstruction.
One or more fans may be operating at lower speed than normal. Check the
fan performance.
Check for problems with the room air conditioning.
If the problem is not fixed, the operating temperature may become
non-recoverable, in which case there are chances that the hardware may be
damaged.At that temperature level, on Integrity servers, the firmware
will shutdown the system automatically. However on HP 9000 servers, the
action specified in the envd config file will be taken - which may be to
shutdown the system automatically.
For information on the sensor that generated this event, refer to FRU ID
in Event Details section.
Additional Event Data:
System IP Address...: 137.1.100.12
Event Id............: 0x51fa7aaf00000000
Monitor Version.....: B.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_ia64_corehw.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: ia64 hp server rx6600
EMS Version.....................: A.04.20
STM Version.....................: C.58.00
System Serial Number............: SGH49016E9
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/ia64_corehw.htm#101011
v-v-v-v-v-v-v-v-v-v-v-v-v DETAILS v-v-v-v-v-v-v-v-v-v-v-v-v
Event Details :
Event Date .............: Thu Aug1 23:11:34 2013
Sensor Number ..........: 0xda
Sensor Type ............: Temperature
Sensor Class ...........: Threshold based
Sensor Reading/Offset...: 0x07 (Offset)
EventType.............: Assertion
Entity ID ..............: 3
Generic Message.........:
Temperature :Upper non-critical - going high
Entity FRU Id Info......:
processor (Sensor ID: Processor 1)
>---------- End Event Monitoring Service Event Notification ----------<
之前已经说过了~那是CPU报出来的~
http://bbs.chinaunix.net/forum.php?mod=redirect&goto=findpost&ptid=4092553&pid=23932415&fromuid=21562621 lbseraph 发表于 2013-08-04 12:43 static/image/common/back.gif
之前已经说过了~那是CPU报出来的~
http://bbs.chinaunix.net/forum.php?mod=redirect&goto=findpost&pti ...
lbseraph 发表于 2013-07-27 10:12 static/image/common/back.gif
回复 1# freebug
那是CPU的温度告警,超过98度了(100度)。有几个物理CPU?告警提示的是Processor 0和 ...
多谢版大,不过还是很奇怪,我把另外一台正常的RX6600的CPU抽屉跟有问题这台直接交换了,然后另外一台正常的还是正常,这台还是报警。
MP卡里看风扇都运行正常,就是时不时报温度高,我看过CPU抽屉里面灰尘也不多。真是想不通啊。 如果可以,把BMC所在的板子MP卡先交换下看看,看问题是否随板走。 回复 7# lbseraph
我把调整了一路进风口直接对着这台RX6600观察了三天倒是没有再报错了,另外还想问一下版大,是怎么看出来CPU温度超过100度的啊?是有专业的解码工具么? 800问一下 回复 8# freebug
这个是HP内部的工具来decode出来的,你能看的话就是用sl的text方式检查一下简要的说明,不过应该能看到CPU报出来的,这就足够了。超了多少度不用太过关心,了解到触发谁的内部超温告警就足够了,不同部件对应的超温告警温度是不一样的;没记错的话,主板貌似是大概42度就会强制shutdown的(防止高温烧坏备件)。
页:
[1]
2