免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2218 | 回复: 9
打印 上一主题 下一主题

Blade 150 问题 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2007-01-24 16:10 |只看该作者 |倒序浏览
有一台150老不定期重启,有时四五天一次 有时一两天一次 ,高手帮忙分析一下是不是内存问题,做了POST看不出问题,message里面有报错

message如下:
Jan 21 14:58:27 BLADE100 SUNW,UltraSPARC-IIe: [ID 254898 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at TL=0, errID 0x0000278f.f2298fe3
Jan 21 14:58:27 BLADE100     AFSR 0x00000000.80300000<RIV,UE,CE> AFAR 0x00000000.00104f40
Jan 21 14:58:27 BLADE100     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10023d3c
Jan 21 14:58:27 BLADE100     UDBH 0x031a<UE,CE> UDBH.ESYND 0x1a UDBL 0x0000 UDBL.ESYND 0x00
Jan 21 14:58:27 BLADE100     UDBH Syndrome 0x1a Memory Module DIMM0
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 440034 kern.info] [AFT2] errID 0x0000278f.f2298fe3 E$tag != PA from AFAR; E$line was victimized
Jan 21 14:58:28 BLADE100     dumping memory from PA 0x00000000.00104f40 instead
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x81480000.05000000
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 359263 kern.info] [AFT2] E$Data (0x0: 0x07000486.8410a000
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x8610e354.8528b020
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 359263 kern.info] [AFT2] E$Data (0x1: 0x9de3bf50.b210c002
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x05000000.0700048d
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 359263 kern.info] [AFT2] E$Data (0x2: 0x8410a000.8528b020
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x8610e000.b010c002
Jan 21 14:58:28 BLADE100 SUNW,UltraSPARC-IIe: [ID 359263 kern.info] [AFT2] E$Data (0x3: 0x05000000.07000486
Jan 21 14:58:28 BLADE100 unix: [ID 836849 kern.notice]
Jan 21 14:58:28 BLADE100 ^Mpanic[cpu0]/thread=2a100017d40:
Jan 21 14:58:28 BLADE100 unix: [ID 863129 kern.notice] [AFT1] errID 0x0000278f.f2298fe3 UE Error(s)
Jan 21 14:58:28 BLADE100     See previous message(s) for details
Jan 21 14:58:28 BLADE100 unix: [ID 100000 kern.notice]
Jan 21 14:58:28 BLADE100 genunix: [ID 723222 kern.notice] 000002a1000173b0 SUNW,UltraSPARC-IIe:cpu_aflt_log+4e0 (2a10001746e, 1, 101449b0, 2a1000175f8, 2a1000174bb, 101449d
Jan 21 14:58:29 BLADE100 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 000002a1000176d0 0000000000000003 0000000000000010
Jan 21 14:58:29 BLADE100   %l4-7: 000000001041b180 000000000002c73c 0000000000000000 0000030000163ea0
Jan 21 14:58:29 BLADE100 genunix: [ID 723222 kern.notice] 000002a100017600 SUNW,UltraSPARC-IIe:cpu_async_error+7dc (2a100017718, 104f40, 8, 1040daec, 0, 31a)
Jan 21 14:58:29 BLADE100 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000070 000002a100017788 0000000000000000 0000000080300000
Jan 21 14:58:29 BLADE100   %l4-7: 000002a1000176d0 0000000000104f80 0000000000020000 0000000000000004
Jan 21 14:58:30 BLADE100 genunix: [ID 723222 kern.notice] 000002a1000177e0 unix:prom_rtt+0 (2a000104f80, 2fb1, 0, 21, 80000, 1)
Jan 21 14:58:30 BLADE100 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000007 0000000000001400 00000044f0001606 000000001013b8c0
Jan 21 14:58:30 BLADE100   %l4-7: 0000000000000000 0000000000000200 0000000000000000 000002a100017890
Jan 21 14:58:30 BLADE100 genunix: [ID 723222 kern.notice] 000002a100017930 unix:memscrub_scan+148 (8000, 0, 10416b88, 10414538, 1000ba74, 0)
Jan 21 14:58:30 BLADE100 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000400000 0000000010042600 000003000028e000 0000000000002000
Jan 21 14:58:30 BLADE100   %l4-7: 0000030000277420 0000000000000000 0000030000e73f00 000003000015e000
Jan 21 14:58:31 BLADE100 genunix: [ID 723222 kern.notice] 000002a100017a60 unix:memscrubber+330 (0, 1040d944, 10416b8c, 159, 10416b80, 1040da4
Jan 21 14:58:31 BLADE100 genunix: [ID 179002 kern.notice]   %l0-3: 0000000010416b80 000000001040d940 0000000010459e10 0000000000000000
Jan 21 14:58:31 BLADE100   %l4-7: 0000000000000000 0000000045b40c33 000000000000001a 0000000000000001
Jan 21 14:58:31 BLADE100 unix: [ID 100000 kern.notice]
Jan 21 14:58:31 BLADE100 genunix: [ID 672855 kern.notice] syncing file systems...
Jan 21 14:58:32 BLADE100 genunix: [ID 904073 kern.notice]  done
Jan 21 14:58:33 BLADE100 genunix: [ID 353387 kern.notice] dumping to /dev/dsk/c0t0d0s1, offset 107806720
Jan 21 15:00:16 BLADE100 genunix: [ID 409368 kern.notice] ^M100% done: 10657 pages dumped, compression ratio 4.07,
Jan 21 15:00:16 BLADE100 genunix: [ID 851671 kern.notice] dump succeeded
Jan 21 15:01:07 BLADE100 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.8 Version Generic_108528-09 64-bit

论坛徽章:
0
2 [报告]
发表于 2007-01-24 16:11 |只看该作者
POST信息如下:

ok power-off
ok Speed Jumper is set to 0000.0000.0000.000a
Hardware Power ON

@(#)OBP 4.10.6 2003/06/06 12:30

Executing Power On SelfTest

@(#) Sun (Grover) POST 2.0.1 05:13 PM on 08/23/01

Processor Module Identification
        UltraSPARC-IIe+ (Phantom) Version 1.3
Init POST BSS
        Init System BSS
NVRAM Tests
        NVRAM Battery Detect Test
        NVRAM Scratch Addr Test
        NVRAM Scratch Data Test
DMMU TLB Tags
        DMMU TLB Tag Access Test
DMMU TLB RAM
        DMMU TLB RAM Access Test
Probe Ecache
        Probe Ecache
        Ecache Size = 0x00080000 bytes = 512 KBytes
Measure CPU Clock
        Initializing Southbridge
        Nominal CPU speed is 650 MHz
All CPU Basic Tests
        V9 Instruction Test
        CPU Tick and Tick Compare Reg Test
        CPU Soft Trap Test
        CPU Softint Reg and Int Test
All Basic MMU Tests
        DMMU Primary Context Reg Test
        DMMU Secondary Context Reg Test
        DMMU TSB Reg Test
        DMMU Tag Access Reg Test
        DMMU VA Watchpoint Reg Test
        DMMU PA Watchpoint Reg Test
        IMMU TSB Reg Test
        IMMU Tag Access Reg Test
All Basic Cache Tests
        Dcache RAM Test
        Dcache Tag Test
        Icache RAM Test
        Icache Tag Test
        Icache Next Test
        Icache Predecode Test
MCU Control & Status Regs Init
        Initializing Memory and MC registers
        DIMM 0: 512 MBytes = 0x20000000 bytes
        DIMM 1: 512 MBytes = 0x20000000 bytes
        DIMM 2: 0 MBytes = 0x00000000 bytes
        DIMM 3: 0 MBytes = 0x00000000 bytes
        Found 2 DIMMs in bank 0
        Bank 0: 1024 MBytes
        DIMM0 is a 32M x 8 device
        DIMM1 is a 32M x 8 device
        MC0 = 0x00000000.96a0cf06
        MC1 = 0x00000000.80008000
        MC2 = 0x00000000.c33300ee
        MC3 = 0x00000000.006008cf
        CPU MODULE upa_config is 0x0000003a.00000000
Ecache Tests
        Displacement Flush Ecache
        Ecache RAM Addr Test
        Ecache Tag Addr Test
        Ecache RAM Test
        Ecache Tag Test
Memory Init
        Malloc Post Memory
        Memory Addr Check w/o Ecache
        Load Post In Memory
        Run POST from MEM
        .........
        Map PROM/STACK/NVRAM in DMMU
        Update Master Stack/Frame Pointers
All FPU Basic Tests
        FPU Regs Test
        FPU Move Regs Test
        FPU State Reg Test
        FPU Functional Test
        FPU Trap Test
All Basic IOMMU Tests
        PIO Decoder and BCT Test
        PCI Byte Enable Test
        CPU's IOMMU Regs Test
        CPU's IOMMU RAM Addr Test
        CPU's IOMMU CAM Address Test
        IOMMU TLB Compare Test
        IOMMU TLB Flush Test
        PBMA PCI Config Space Regs Test
        PBMA Control/Status Reg Test
        PBMA Diag Reg Test
        CPU's IO Regs Test
All Advanced CPU Tests
        DMMU Hit/Miss Test
        IMMU Hit/Miss Test
        DMMU Little Endian Test
        IU ASI Access Test
        FPU ASI Access Test
        Ecache Thrash Test
All CPU Error Reporting Tests
        CPU Data Access Trap Test
        CPU Addr Align Trap Test
        DMMU Access Priv Page Test
        DMMU Write Protected Page Test
Audio Tests
        Map Audio Device PCI Config Registers Test
        Audio Device ID and Vendor ID (0x545110b9) Test
        Init Audio Device IO Registers Test
        Audio Device Memory Registers Test
Memory Tests
        Init Memory
                Info : 512MB at Dimm Slot 0
                Start Addr: 0x00000000.00800000  Size: 504 MBytes
Init with 0x00000000.00000000:
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
                Info : 512MB at Dimm Slot 1
                Start Addr: 0x00000000.40000000  Size: 512 MBytes
Init with 0x00000000.00000000:
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
        Memory Addr Check with Ecache Test
                Info : 512MB at Dimm Slot 0
                Start Addr: 0x00000000.00800000  Size: 504 MBytes
Write 0xffffffff.ffffffff: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Read: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Write 0xaaaaaaaa.aaaaaaaa: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Read: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Write 0x55555555.55555555: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Read: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Write 0x00000000.00000000: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Read: ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Status of this POST run:        PASS
diag-script=none
Time Stamp [hour:min:sec] 09:47:52  [month/date year] 01/24 2007






Power On Selftest Completed
    Status  = 0000.0000.0000.0000  ffff.ffff.f00b.63f0  0002.3333.0200.001b

Speed Jumper is set to 0000.0000.0000.000a
Software Power ON

论坛徽章:
0
3 [报告]
发表于 2007-01-25 11:22 |只看该作者

昨天把第二根内存拔下来 打了系统最新patch 结果晚上重启了三次

论坛徽章:
0
4 [报告]
发表于 2007-01-25 11:38 |只看该作者
你在哪个城市啊,我们这有台ULTRA5也有这个问题,EXPLORER后看不出什么。
[ID 254898 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at TL=0, errID 0x0000278f.f2298fe3  
我开始以为是内存的问题,换了内存问题还在``

论坛徽章:
0
5 [报告]
发表于 2007-01-25 11:44 |只看该作者
an 21 14:58:27 BLADE100 SUNW,UltraSPARC-IIe: [ID 254898 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at TL=0, errID 0x0000278f.f2298fe3

This must be CPU cache error message,
please kindly replace your CPU0, and not the Memeory problem..................

论坛徽章:
0
6 [报告]
发表于 2007-01-25 13:04 |只看该作者
找一组别的pn的内存试试。

论坛徽章:
0
7 [报告]
发表于 2007-01-25 21:35 |只看该作者
有人遇到过么

论坛徽章:
0
8 [报告]
发表于 2007-01-25 21:54 |只看该作者
难道是这个问题么 但是我已经打过最新系统补丁了
包含了108528-29 还是重启

http://www.chinaunix.net/jh/6/69831.html

论坛徽章:
0
9 [报告]
发表于 2007-01-25 23:21 |只看该作者
It does seem to be a hardware problem. To get an accurate reading of the problem, you should set up system dump to capture the kernel core dump so that it can be analyzed either by Sun or by yourself.

论坛徽章:
0
10 [报告]
发表于 2007-01-26 10:56 |只看该作者
顶。。。。。。。。。。。。。。。。。。。。。。。。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP