SUN V245,很奇怪的两个问题~~~~~~
客户有几台SUN V245,今天下午有一台宕机了。到机房一看告警灯常亮,ok灯绿色闪烁。连接串口无任何输出,没办法只能重启。重启以后sc中一切正常
sc> showenvironment
=============== Environmental Status ===============
--------------------------------------------------------------------------------
System Temperatures (Temperatures in Celsius):
--------------------------------------------------------------------------------
Sensor Status Temp LowHard LowSoft LowWarn HighWarn HighSoft HighHard
--------------------------------------------------------------------------------
MB.P0.T_CORE OK 73 -15 -10 0 100 105 110
MB.P1.T_CORE OK 77 -15 -10 0 100 105 110
MB.T_REMOTE OK 38 -- -- -- -- -- --
MB.T_1064 OK 68 -15 -10 0 105 110 115
MB.T_FIRE OK 47 -15 -10 0 95 105 108
MB.T_AMB OK 42 -15 -10 0 65 75 85
FIOB.T_AMB OK 22 -15 -10 0 45 47 50
PDB.T_DISK OK 30 -15 -10 0 55 65 70
PDB.T_PS0 OK 27 -15 -10 0 48 50 53
PDB.T_PS1 OK 28 -15 -10 0 48 50 53
--------------------------------------
Keyswitch:
--------------------------------------
Keyswitch position: NORMAL
--------------------------------------------------------
System Indicator Status:
--------------------------------------------------------
SYS.LOCATE SYS.SERVICE SYS.ACT
--------------------------------------------------------
OFF OFF ON
--------------------------------------------------------
SYS.PSFAIL SYS.OVERTEMP SYS.FANFAIL
--------------------------------------------------------
OFF OFF OFF
--------------------------------------------
System Disks:
--------------------------------------------
Disk Status ServiceOK2RM
--------------------------------------------
HDD0 OK OFF OFF
HDD1 OK OFF OFF
HDD2 OK OFF OFF
HDD3 OK OFF OFF
----------------------------------------------------------
Fans (Speeds Revolution Per Minute):
----------------------------------------------------------
Sensor Status Speed Warn Low
----------------------------------------------------------
PDB.HDDFB.FT6.F0 OK 10505 -- 8000
PDB.HDDFB.FT6.F1 OK 10714 -- 8000
FT0.F0 OK 3879 -- 2022
FT1.F0 OK 3924 -- 2022
FT2.F0 OK 3879 -- 2022
FT3.F0 OK 4066 -- 2022
FT4.F0 OK 3970 -- 2022
FT5.F0 OK 3924 -- 2022
--------------------------------------------------------------------------------
Voltage sensors (in Volts):
--------------------------------------------------------------------------------
Sensor Status Voltage LowSoft LowWarn HighWarn HighSoft
--------------------------------------------------------------------------------
MB.P0.V_CORE OK 1.45 1.21 1.23 1.57 1.60
MB.P1.V_CORE OK 1.48 1.21 1.23 1.57 1.60
MB.V_+3V3 OK 3.31 2.48 2.48 3.49 3.59
MB.V_+12V OK 12.10 9.04 9.04 12.96 13.56
MB.BAT.V_BAT OK 3.21 2.26 2.26 3.51 3.60
--------------------------------------------
Power Supply Indicators:
--------------------------------------------
Supply DC-OK AC-OK Service
--------------------------------------------
PS0 ON ON OFF
PS1 ON ON OFF
------------------------------------------------------------------------------
Power Supplies:
------------------------------------------------------------------------------
SupplyStatus UnderspeedOvertempOvervoltUndervoltOvercurrent
------------------------------------------------------------------------------
PS0 OK OFF OFF OFF OFF OFF
PS1 OK OFF OFF OFF OFF OFF
sc> showlogs
Log entries since FEB 14 03:25:31
----------------------------------
FEB 14 03:25:31 Portal2: 00060000: "SC Login: User admin Logged on."
FEB 14 03:25:31 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:01:57 Portal2: 00060002: "SC Login: User admin Logged out."
FEB 14 04:01:57 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:02:14 Portal2: 00060000: "SC Login: User admin Logged on."
FEB 14 04:02:14 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:08:29 Portal2: 00040001: "SC Request to Power On Host."
FEB 14 04:08:29 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:08:30 Portal2: 00040002: "Host System has Reset"
FEB 14 04:08:30 Portal2: 0004000b: "Host System has read and cleared bootmode."
FEB 14 04:08:33 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:08:33 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:09:18 Portal2: 0004004f: "Indicator PS0.DC_OK is now ON"
FEB 14 04:09:18 Portal2: 0004004f: "Indicator PS1.DC_OK is now ON"
FEB 14 04:09:18 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:09:18 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:10:46 Portal2: 00040002: "Host System has Reset"
FEB 14 04:10:46 Portal2: 00060007: "Failed to send email alert for recent event."
FEB 14 04:11:38 Portal2: 0004000b: "Host System has read and cleared bootmode."
FEB 14 04:11:39 Portal2: 00060007: "Failed to send email alert for recent event."
poweron后正常进入操作系统,查看/var/adm/messages只有这次启动的日志记录,/var/crach/Portal下面也没有dump。
只有在last reboot中有系统宕机的时间点。
reboot system down SUN Aug5 01:15
问题1:为什么没有宕机的记录啊?大侠给分析分析,或者给个查问题的方向或者方法。
本想收集exploer的,但是收集的时候一直停在disks running
最后发现raidctl的输出没完没了了
root@Portal2 # format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <LSILOGIC-LogicalVolume-3000 cyl 65533 alt 2 hd 16 sec 136>
/pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0
1. c1t2d0 <LSILOGIC-LogicalVolume-3000 cyl 65533 alt 2 hd 16 sec 136>
/pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@2,0
Specify disk (enter its number): ^C
root@Portal2 #
root@Portal2 # raidctl -l c1t0d0
Volume Size StripeStatus CacheRAID
Sub Size Level
Disk
----------------------------------------------------------------
c1t0d0 68.3G N/A OPTIMALN/A RAID1
0.3.0 68.3G GOOD
0.2.0 68.3G GOOD
root@Portal2 # raidctl -l c1t2d0
Volume Size StripeStatus CacheRAID
Sub Size Level
Disk
----------------------------------------------------------------
c1t2d0 68.3G N/A OPTIMALN/A RAID1
0.3.0 68.3G GOOD
0.2.0 68.3G GOOD
root@Portal2 #
root@Portal2 # raidctl -l
Controller: 1
Volume:c1t0d0
Volume:c1t2d0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
Disk: 0.2.0
^Croot@Portal2 #
这里能等10分钟就一直是这样,真是搞不清状况了,神马情况啊!
问题2:系统4块硬盘,做了两个硬raid1.raidctl输出没完没了,按道理卷下面就是四块物理盘就完了。而且raidctl -l c1t0d0和raidctl -l c1t2d0中显示的disk都是一样的,越看越看不懂啊。
以上两个问题请大侠指点迷津!!! 主板要坏的前兆吧 俺这有主板哈。。。 iostat-En不是硬盘,就是板载scsi控制器抽疯了。 本帖最后由 hzg1103 于 2012-08-08 17:46 编辑
回复 4# znnnz
今天通往网络远程重新尝试收集exploer,很快就收集到了。查看riadctl.out无内容,查看raidctl.err内容如下:
raidctl:enter_filelock:filelock is owned by 'process 24308'
可能是因为我在另一个窗口运行raidctl -l,而exploer就这么过去了。
下面是iostat_-E.out内容:
sd1 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: LSILOGIC Product: Logical Volume Revision: 3000 Serial No:
Size: 73.01GB <73012215808 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
sd2 Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: MATSHITA Product: DVD-RAM UJ-85JSRevision: F100 Serial No:
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd4 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: LSILOGIC Product: Logical Volume Revision: 3000 Serial No:
Size: 73.01GB <73012215808 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
我勒个去。
页:
[1]