T5440报错
#prtdiag -v系统配置:Sun Microsystemssun4v T5440
内存大小:65248 兆字节
======================= Physical Memory Configuration ========================
Segment Table:
--------------------------------------------------------------
Base SegmentInterleaveBank Contains
Address Size Factor Size Modules
--------------------------------------------------------------
0x0 64 GB 8 8 GB MB/CPU0/CMP0/BR0/CH0/D0
MB/CPU0/CMP0/BR0/CH1/D0
8 GB MB/CPU0/CMP0/BR1/CH0/D0
MB/CPU0/CMP0/BR1/CH1/D0
8 GB MB/MEM0/CMP0/BR0/CH0/D1
MB/MEM0/CMP0/BR0/CH1/D1
8 GB MB/MEM0/CMP0/BR1/CH0/D1
MB/MEM0/CMP0/BR1/CH1/D1
8 GB MB/CPU1/CMP1/BR0/CH0/D0
MB/CPU1/CMP1/BR0/CH1/D0
8 GB MB/CPU1/CMP1/BR1/CH0/D0
MB/CPU1/CMP1/BR1/CH1/D0
8 GB MB/MEM1/CMP1/BR0/CH0/D1
MB/MEM1/CMP1/BR0/CH1/D1
8 GB MB/MEM1/CMP1/BR1/CH0/D1
MB/MEM1/CMP1/BR1/CH1/D1
================================ IO Devices ================================
Slot + Bus Name + Model
Status TypePath
----------------------------------------------------------------------------
MB/HBA PCIEscsi-pciex1000,58 LSI,1068E
/pci@400/pci@0/pci@1/scsi@0
MB/PCIE1 PCIESUNW,qlc-pciex1077,2432 QLE2460
/pci@400/pci@0/pci@c/SUNW,qlc@0
MB/PCIE0 PCIESUNW,XVR-300 SUNW,375-3545
/pci@400/pci@0/pci@d/SUNW,XVR-300@0
MB PCIEnetwork-pciex108e,abcd SUNW,pcie-neptune
/pci@500/pci@0/pci@c/network@0
MB PCIEnetwork-pciex108e,abcd SUNW,pcie-neptune
/pci@500/pci@0/pci@c/network@0,1
MB PCIEnetwork-pciex108e,abcd SUNW,pcie-neptune
/pci@500/pci@0/pci@c/network@0,2
MB PCIEnetwork-pciex108e,abcd SUNW,pcie-neptune
/pci@500/pci@0/pci@c/network@0,3
MB/PCIE4 PCIESUNW,qlc-pciex1077,2432 QLE2460
/pci@500/pci@0/pci@d/SUNW,qlc@0
MB PCIXusb-pciclass,0c0310
/pci@400/pci@0/pci@9/pci@0/usb@0
MB PCIXusb-pciclass,0c0310
/pci@400/pci@0/pci@9/pci@0/usb@0,1
MB PCIXusb-pciclass,0c0320
/pci@400/pci@0/pci@9/pci@0/usb@0,2
============================ Environmental Status ============================
Fan sensors:
----------------------------------------------------------------
Location Sensor Status
----------------------------------------------------------------
SYS/MB/FT0 TACH ok
SYS/MB/FT1 TACH ok
SYS/MB/FT2 TACH ok
SYS/MB/FT3 TACH ok
Fan indicators:
----------------------------------------------------------------
Location Sensor Condition
----------------------------------------------------------------
SYS/PS0 FAN_FAULT ok
SYS/PS1 FAN_FAULT ok
SYS/PS2 FAN_FAULT ok
SYS/PS3 FAN_FAULT ok
Temperature sensors:
----------------------------------------------------------------
Location Sensor Status
----------------------------------------------------------------
SYS/MB T_MB2 ok
SYS/MB/CPU0 T_CPU_INLET ok
SYS/MB/CPU0 T_VF_TOP ok
SYS/MB/CPU1 T_CPU_INLET ok
SYS/MB/CPU1 T_VF_TOP ok
SYS T_AMB ok
Temperature indicators:
----------------------------------------------------------------
Location Indicator Condition
----------------------------------------------------------------
SYS/PS0 TEMP_FAULT ok
SYS/PS1 TEMP_FAULT ok
SYS/PS2 TEMP_FAULT ok
SYS/PS3 TEMP_FAULT ok
Current sensors:
----------------------------------------------------------------
Location Sensor Status
----------------------------------------------------------------
SYS/MB/CPU0/DVRM_CORE I_+1V1 disabled
SYS/MB/CPU1/DVRM_CORE I_+1V1 ok
SYS/PS0 I_+12V ok
SYS/PS0 I_AC ok
SYS/PS0 I_AC_LIMIT ok
SYS/PS0 I_DC_LIMIT ok
SYS/PS1 I_+12V ok
SYS/PS1 I_AC ok
SYS/PS1 I_AC_LIMIT ok
SYS/PS1 I_DC_LIMIT ok
SYS/PS2 I_+12V ok
SYS/PS2 I_AC ok
SYS/PS2 I_AC_LIMIT ok
SYS/PS2 I_DC_LIMIT ok
SYS/PS3 I_+12V ok
SYS/PS3 I_AC ok
SYS/PS3 I_AC_LIMIT ok
SYS/PS3 I_DC_LIMIT ok
Current indicators:
----------------------------------------------------------------
Location Indicator Condition
----------------------------------------------------------------
SYS/PS0 CUR_FAULT ok
SYS/PS1 CUR_FAULT ok
SYS/PS2 CUR_FAULT ok
SYS/PS3 CUR_FAULT ok
Voltage sensors:
----------------------------------------------------------------
Location Sensor Status
----------------------------------------------------------------
SYS/MB/SP V_+3V3STBY ok
SYS/MB/SP V_+3V3REG ok
SYS/MB/DVRM_PLX V_+1V0 ok
SYS/MB/DVRM_ZAM23 V_+1V1 ok
SYS/MB/DVRM_ZAM01 V_+1V1 ok
SYS/MB/DVRM_ZAMP V_+1V5 ok
SYS/MB/DVRM_33 V_+3V3 ok
SYS/MB V_+12V ok
SYS/MB V_PHY+2V5 ok
SYS/MB V_NEP+1V2 ok
SYS/MB V_MAIN+3V3 ok
SYS/MB/CPU0/DVRM_CORE V_+1V1 disabled
SYS/MB/CPU0/DVRM_FBD V_+1V8 ok
SYS/MB/CPU0/DVRM_15 V_+1V5 ok
SYS/MB/CPU1/DVRM_CORE V_+1V1 ok
SYS/MB/CPU1/DVRM_FBD V_+1V8 ok
SYS/MB/CPU1/DVRM_15 V_+1V5 ok
SYS/MB/MEM0/DVRM_15 V_+1V5 ok
SYS/MB/MEM0/DVRM_FBD V_+1V8 ok
SYS/MB/MEM1/DVRM_15 V_+1V5 ok
SYS/MB/MEM1/DVRM_FBD V_+1V8 ok
SYS/PS0 V_+12V ok
SYS/PS0 V_AC ok
SYS/PS1 V_+12V ok
SYS/PS1 V_AC ok
SYS/PS2 V_+12V ok
SYS/PS2 V_AC ok
SYS/PS3 V_+12V ok
SYS/PS3 V_AC ok
Voltage indicators:
----------------------------------------------------------------
Location Indicator Condition
----------------------------------------------------------------
SYS/MB/SP V_+3VBATT ok
SYS/PS0 AC_POK fail
SYS/PS0 DC_POK fail
SYS/PS0 VOLT_FAULT fail
SYS/PS1 AC_POK fail
SYS/PS1 DC_POK fail
SYS/PS1 VOLT_FAULT fail
SYS/PS2 AC_POK ok
SYS/PS2 DC_POK ok
SYS/PS2 VOLT_FAULT ok
SYS/PS3 AC_POK ok
SYS/PS3 DC_POK ok
SYS/PS3 VOLT_FAULT ok
LEDs:
----------------------------------------------------------------
Location LED State
----------------------------------------------------------------
SYS SERVICE steady
SYS LOCATE off
SYS ACT steady
SYS PS_FAULT steady
SYS TEMP_FAULT steady
SYS FAN_FAULT off
SYS/MB/CPU0/CMP0/BR0/CH0/D0 SERVICE off
SYS/MB/CPU0/CMP0/BR0/CH1/D0 SERVICE off
SYS/MB/CPU0/CMP0/BR1/CH0/D0 SERVICE off
SYS/MB/CPU0/CMP0/BR1/CH1/D0 SERVICE off
SYS/MB/CPU0 SERVICE off
SYS/MB/CPU1/CMP1/BR0/CH0/D0 SERVICE off
SYS/MB/CPU1/CMP1/BR0/CH1/D0 SERVICE off
SYS/MB/CPU1/CMP1/BR1/CH0/D0 SERVICE off
SYS/MB/CPU1/CMP1/BR1/CH1/D0 SERVICE off
SYS/MB/CPU1 SERVICE steady
SYS/MB/MEM0/CMP0/BR0/CH0/D1 SERVICE off
SYS/MB/MEM0/CMP0/BR0/CH1/D1 SERVICE off
SYS/MB/MEM0/CMP0/BR1/CH0/D1 SERVICE off
SYS/MB/MEM0/CMP0/BR1/CH1/D1 SERVICE off
SYS/MB/MEM0 SERVICE off
SYS/MB/MEM1/CMP1/BR0/CH0/D1 SERVICE off
SYS/MB/MEM1/CMP1/BR0/CH1/D1 SERVICE off
SYS/MB/MEM1/CMP1/BR1/CH0/D1 SERVICE off
SYS/MB/MEM1/CMP1/BR1/CH1/D1 SERVICE off
SYS/MB/MEM1 SERVICE off
SYS/MB/FT0 SERVICE off
SYS/MB/FT1 SERVICE off
SYS/MB/FT2 SERVICE off
SYS/MB/FT3 SERVICE off
SYS/HDD0 SERVICE off
SYS/HDD0 OK2RM off
SYS/HDD1 SERVICE off
SYS/HDD1 OK2RM off
SYS/HDD2 SERVICE off
SYS/HDD2 OK2RM off
SYS/HDD3 SERVICE off
SYS/HDD3 OK2RM off
============================ FRU Status ============================
Location Name Status
------------------------------------------------------
SYS MB enabled
SYS/MB SP enabled
SYS/MB SCC_NVRAM enabled
SYS/MB CPU0 enabled
SYS/MB/CPU0/CMP0/BR0/CH0 D0 enabled
SYS/MB/CPU0/CMP0/BR0/CH1 D0 enabled
SYS/MB/CPU0/CMP0/BR1/CH0 D0 enabled
SYS/MB/CPU0/CMP0/BR1/CH1 D0 enabled
SYS/MB CPU1 enabled
SYS/MB/CPU1/CMP1/BR0/CH0 D0 enabled
SYS/MB/CPU1/CMP1/BR0/CH1 D0 enabled
SYS/MB/CPU1/CMP1/BR1/CH0 D0 enabled
SYS/MB/CPU1/CMP1/BR1/CH1 D0 enabled
SYS/MB MEM0 enabled
SYS/MB/MEM0/CMP0/BR0/CH0 D1 enabled
SYS/MB/MEM0/CMP0/BR0/CH1 D1 enabled
SYS/MB/MEM0/CMP0/BR1/CH0 D1 enabled
SYS/MB/MEM0/CMP0/BR1/CH1 D1 enabled
SYS/MB MEM1 enabled
SYS/MB/MEM1/CMP1/BR0/CH0 D1 enabled
SYS/MB/MEM1/CMP1/BR0/CH1 D1 enabled
SYS/MB/MEM1/CMP1/BR1/CH0 D1 enabled
SYS/MB/MEM1/CMP1/BR1/CH1 D1 enabled
SYS/MB FT0 enabled
SYS/MB FT1 enabled
SYS/MB FT2 enabled
SYS/MB FT3 enabled
SYS HDD0 enabled
SYS HDD1 enabled
SYS HDD2 enabled
SYS HDD3 enabled
SYS DVD enabled
SYS USBBD enabled
SYS PS0 enabled
SYS PS1 enabled
SYS PS2 enabled
SYS PS3 enabled
============================ FW Version ============================
Version
------------------------------------------------------------
Sun System Firmware 7.2.10.a 2010/10/08 16:40
====================== System PROM revisions =======================
Version
------------------------------------------------------------
OBP 4.30.9 2010/07/16 09:06
Chassis Serial Number
---------------------
BDL1123020
#fmdump
TIME UUID SUNW-MSG-ID
fmdump: /var/fm/fmd/fltlog is empty
#
各位,请问下这个是CPU上的哪个核有问题吗???
sc下面的信息,明天去收集,fmdump信息为空;
dmesg信息只有一部分:
以下是dmesg的部分信息
1/scsi@0/sd@3,0
May 28 10:02:18 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 10:02:18 stest2Corrupt label; wrong magic number
May 28 10:02:18 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 10:02:18 stest2Corrupt label; wrong magic number
May 28 10:02:26 stest2 pseudo: pseudo-device: ramdisk1024
May 28 10:02:26 stest2 genunix: ramdisk1024 is /pseudo/ramdisk@1024
May 28 10:02:26 stest2 ebus: su0 at ebus0: offset 0,ca0000
May 28 10:02:26 stest2 genunix: su0 is /ebus@c0/serial@0,ca0000
May 28 10:02:26 stest2 pseudo: pseudo-device: lockstat0
May 28 10:02:26 stest2 genunix: lockstat0 is /pseudo/lockstat@0
May 28 10:02:26 stest2 pseudo: pseudo-device: fcsm0
May 28 10:02:26 stest2 genunix: fcsm0 is /pseudo/fcsm@0
May 28 10:02:26 stest2 pseudo: pseudo-device: llc10
May 28 10:02:26 stest2 genunix: llc10 is /pseudo/llc1@0
May 28 10:02:26 stest2 pseudo: pseudo-device: lofi0
May 28 10:02:26 stest2 genunix: lofi0 is /pseudo/lofi@0
May 28 10:02:26 stest2 pseudo: pseudo-device: trapstat0
May 28 10:02:26 stest2 genunix: trapstat0 is /pseudo/trapstat@0
May 28 10:02:26 stest2 pseudo: pseudo-device: fbt0
May 28 10:02:26 stest2 genunix: fbt0 is /pseudo/fbt@0
May 28 10:02:26 stest2 pseudo: pseudo-device: profile0
May 28 10:02:26 stest2 genunix: profile0 is /pseudo/profile@0
May 28 10:02:26 stest2 pseudo: pseudo-device: systrace0
May 28 10:02:26 stest2 genunix: systrace0 is /pseudo/systrace@0
May 28 10:02:26 stest2 pseudo: pseudo-device: sdt0
May 28 10:02:26 stest2 genunix: sdt0 is /pseudo/sdt@0
May 28 10:02:26 stest2 pseudo: pseudo-device: ntwdt0
May 28 10:02:26 stest2 genunix: ntwdt0 is /pseudo/ntwdt@0
May 28 10:02:26 stest2 pseudo: pseudo-device: mdesc0
May 28 10:02:26 stest2 genunix: mdesc0 is /pseudo/mdesc@0
May 28 10:02:26 stest2 pseudo: pseudo-device: ds_snmp0
May 28 10:02:26 stest2 genunix: ds_snmp0 is /pseudo/ds_snmp@0
May 28 10:02:26 stest2 pseudo: pseudo-device: fssnap0
May 28 10:02:26 stest2 genunix: fssnap0 is /pseudo/fssnap@0
May 28 10:02:26 stest2 pseudo: pseudo-device: winlock0
May 28 10:02:26 stest2 genunix: winlock0 is /pseudo/winlock@0
May 28 10:02:26 stest2 fp: NOTICE: fp(0): PLOGI to 2 failed state=Packet Transport error, reason=No Connection
May 28 10:02:26 stest2 fctl: WARNING: fp(0)::PLOGI to 2 failed. state=e reason=5.
May 28 10:02:26 stest2 fp: NOTICE: fp(0): PLOGI to 1 failed state=Packet Transport error, reason=No Connection
May 28 10:02:26 stest2 fctl: WARNING: fp(0)::PLOGI to 1 failed. state=e reason=5.
May 28 10:02:26 stest2 pseudo: pseudo-device: rsm0
May 28 10:02:26 stest2 genunix: rsm0 is /pseudo/rsm@0
May 28 10:02:27 stest2 mac: NOTICE: nxge1 registered
May 28 10:02:27 stest2 mac: NOTICE: nxge2 registered
May 28 10:02:27 stest2 mac: NOTICE: nxge3 registered
May 28 10:26:50 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@2,0 (sd3):
May 28 10:26:50 stest2Corrupt label; wrong magic number
May 28 10:26:50 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 10:26:50 stest2Corrupt label; wrong magic number
May 28 11:14:27 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@2,0 (sd3):
May 28 11:14:27 stest2Corrupt label; wrong magic number
May 28 11:14:27 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 11:14:27 stest2Corrupt label; wrong magic number
May 28 12:01:04 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@2,0 (sd3):
May 28 12:01:04 stest2Corrupt label; wrong magic number
May 28 12:01:04 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 12:01:04 stest2Corrupt label; wrong magic number
May 28 14:12:14 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@2,0 (sd3):
May 28 14:12:14 stest2Corrupt label; wrong magic number
May 28 14:12:14 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 14:12:14 stest2Corrupt label; wrong magic number
May 28 14:16:23 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@2,0 (sd3):
May 28 14:16:23 stest2Corrupt label; wrong magic number
May 28 14:16:23 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 14:16:23 stest2Corrupt label; wrong magic number
May 28 14:35:16 stest2 nxge: WARNING: nxge0 :nxge_receive_packet: channel 0 RCR L4_CSUM_ERROR
May 28 15:10:24 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@2,0 (sd3):
May 28 15:10:24 stest2Corrupt label; wrong magic number
May 28 15:10:24 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 15:10:24 stest2Corrupt label; wrong magic number
May 28 15:12:18 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@2,0 (sd3):
May 28 15:12:18 stest2Corrupt label; wrong magic number
May 28 15:12:18 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 15:12:18 stest2Corrupt label; wrong magic number
May 28 16:21:48 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@2,0 (sd3):
May 28 16:21:48 stest2Corrupt label; wrong magic number
May 28 16:21:48 stest2 scsi: WARNING: /pci@400/pci@0/pci@1/scsi@0/sd@3,0 (sd4):
May 28 16:21:48 stest2Corrupt label; wrong magic number 俩电源不正常了,看看供电,不行应该就是坏掉了PS0/PS1 再贴上SC下面的信息
sc> showlogs
Log entries since May 02 00:56:15
----------------------------------
May 02 00:56:15: Chassis |major : "Host is running"
May 08 02:18:49: Chassis |major : "System shutdown has been requested via power button."
May 08 02:18:49: Chassis |critical: "Host has been powered off"
May 08 02:19:00: Chassis |major : "System power on has been requested via power button."
May 08 02:19:01: Chassis |major : "Host has been powered on"
May 08 02:21:29: Chassis |major : "System shutdown has been requested via power button."
May 08 02:23:33: Chassis |critical: "Host has been powered off"
May 08 02:26:04: Chassis |major : "System power on has been requested via power button."
May 08 02:26:06: Chassis |major : "Host has been powered on"
May 08 02:32:33: Chassis |major : "Host is running"
May 21 16:58:38: IPMI |minor : "ID = 3b : 05/21/2012 : 16:58:38 : Voltage : /MB/V_+3V3STBY : Lower Non-critical going low: reading 1.84 <= threshold 3.10 Volts"
May 21 16:58:44: IPMI |minor : "ID = 3c : 05/21/2012 : 16:58:44 : Voltage : /MB/V_+3V3STBY : Lower Non-critical going high : reading 3.30 >= threshold 3.10 Volts"
May 29 02:56:23: Chassis |critical: "Host has been powered off"
May 29 02:58:41: Audit |minor : "root : Open Session : object = "/SP/session/type" : value = "console" : success"
May 29 03:00:39: Audit |minor : "root : Create : object = "/SP/users/admin" : value = "N/A" : success"
May 29 03:00:39: Audit |minor : "root : Set : object = "/SP/users/admin/role" : value = "Administrator" : success"
May 29 03:00:39: Audit |minor : "root : Set : object = "/SP/users/admin/cli_mode" : value = "alom" : success"
May 29 03:00:48: Audit |minor : "root : Set : object = "/SP/users/admin/password" : value = "*****" : success"
May 29 03:01:10: Audit |minor : "root : Close Session : object = "/SP/session/type" : value = "console" : success"
May 29 03:01:17: Audit |minor : "admin : Open Session : object = "/SP/session/type" : value = "console" : success"
sc> showfaults
Last POST Run: Tue May8 02:31:10 2012
Post Status: Passed all devices
ID FRU Fault
1 /SYS/MB/CPU1 SP detected fault: /SYS/MB/CPU1/CMP1/P0 CPU causing reset errors
2 /SYS/PS1 SP detected fault: Voltage fault at PS1 asserted
3 /SYS/PS0 SP detected fault: Voltage fault at PS0 asserted
4 /SYS SP detected fault: cooling zone 1 sensors are not responding
5 /SYS SP detected fault: Input power unavailable for PSU at PS0
6 /SYS SP detected fault: Input power unavailable for PSU at PS1
sc> showfaults -v
Last POST Run: Tue May8 02:31:10 2012
Post Status: Passed all devices
ID Time FRU Class Fault
1 Jan 29 01:22:21 /SYS/MB/CPU1 SP detected fault: /SYS/MB/CPU1/CMP1/P0 CPU causing reset errors
2 Jan 29 01:24:15 /SYS/PS1 SP detected fault: Voltage fault at PS1 asserted
3 Jan 29 01:24:20 /SYS/PS0 SP detected fault: Voltage fault at PS0 asserted
4 Jan 29 01:26:14 /SYS SP detected fault: cooling zone 1 sensors are not responding
5 Jan 29 01:24:25 /SYS SP detected fault: Input power unavailable for PSU at PS0
6 Jan 29 01:24:25 /SYS SP detected fault: Input power unavailable for PSU at PS1
每次重启都有这条报错信息
ERROR: The following devices are disabled:
MB/CPU1/CMP1/P0 CPU应该是正常的,先换了电源再说呗。。。报错信息可以清掉的 回复 5# qianxia0_
电源有问题?不会吧。。。我知道的是,客户那边接了两个电源,还有两个电源没接电 恩 显示的是两个电源不正常 cpu应该是好的 把错误信息清掉就行了 回复 7# qianxia0_
除了那个维修灯亮着之外,还有电源故障LED指示灯REAR PS是亮着的;还有温度灯也是亮着的;
还有SP detected fault: /SYS/MB/CPU1/CMP1/P0 CPU causing reset errors这个不是说的CPU么。。 是的 但是你在系统下看到的状态是OK的 清掉之后黄灯就灭了 回复 9# qianxia0_
在SC下面清除的话不是使用clearfault UUID这个命令么,但是我showfaults -v 下面没有UUID号呀;这怎么清除?
页:
[1]
2