免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2085 | 回复: 9
打印 上一主题 下一主题

新人,求教!、/var目录下的messages和log报错真的可以忽略不计吗? [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2006-09-02 21:23 |只看该作者 |倒序浏览
Solaris9。在/var/adm/下messages不断报错。想请各位大侠指点一下,这样的错误是不是真的如售货方所说没有关系。我总觉得不太对劲,如果都是正常的话,怎么会不断的报错呢?请各位指点一下可能是什么原因引起的,小女子不胜感激。报错内容大致如下:
Aug 31 05:36:09 mail     AFSR 0x00000002<CE>.00000068 AFAR 0x000000a1.e36ab060
Aug 31 05:36:09 mail     Fault_PC 0x1036144 Esynd 0x0068 Slot A: J8201
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 341960 kern.info] [AFT0] errID 0x
001f5d1d.2bc28304 Corrected Memory Error on Slot A: J8201 is Intermittent
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 354393 kern.info] [AFT0] errID 0x
001f5d1d.2bc28304 Data Bit 120 was in error and corrected
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 635056 kern.info] [AFT2] errID 0x
001f5d1d.2bc28304 PA=0x000000a1.e36ab040
Aug 31 05:36:09 mail     E$tag 0x00000287.8d000024 E$state_1 Modified
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (
0x00) 0x00000000.00000003 0x00000000.00000000 ECC 0x1d0
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (
0x10) 0x00000000.014a6c00 0x00000300.00272288 ECC 0x09c
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (
0x20) 0x00000000.00000000 0x00000000.00000000 ECC 0x000
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (
0x30) 0x00000000.00000000 0x00000000.00000000 ECC 0x000
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data
not available
……
Sep  1 00:50:12 mail SUNW,UltraSPARC-III+: [ID 141273 kern.info] [AFT0] errID 0x
001f9c18.278a1bc8 Corrected Memory Error on Slot A: J8201 is Sticky
Sep  1 00:50:12 mail SUNW,UltraSPARC-III+: [ID 819515 kern.info] [AFT0] errID 0x
001f9c18.278a1bc8 Data Bit 120 was in error and corrected
Sep  1 00:50:12 mail unix: [ID 752700 kern.warning] WARNING: [AFT0] Sticky Softe
rror encountered on Memory Module Slot A: J8201
Sep  1 00:50:12 mail SUNW,UltraSPARC-III+: [ID 837520 kern.info] NOTICE: [AFT0]
Corrected system bus (CE) Event detected by CPU0 at TL=0, errID 0x001f9c18.29722
b88
Sep  1 00:50:12 mail     AFSR 0x00000002<CE>.00000068 AFAR 0x000000a1.e36a84d0
Sep  1 00:50:12 mail     Fault_PC <unknown> Esynd 0x0068 Slot A: J8201
Sep  1 00:50:12 mail SUNW,UltraSPARC-III+: [ID 871253 kern.info] [AFT0] errID 0x
001f9c18.29722b88 Corrected Memory Error on Slot A: J8201 is Persistent
Sep  1 00:50:12 mail SUNW,UltraSPARC-III+: [ID 534276 kern.info] [AFT0] errID 0x
001f9c18.29722b88 Data Bit 120 was in error and corrected
Sep  1 00:50:12 mail unix: [ID 596940 kern.warning] WARNING: [AFT0] 2943 soft er
rors in less than 24:00 (hh:mm) detected from Memory Module Slot A: J8201
Sep  1 02:03:39 mail SUNW,UltraSPARC-III+: [ID 259181 kern.info] NOTICE: [AFT0]
Corrected system bus (CE) Event detected by CPU0 at TL=0, errID 0x001fa01a.13cec
b14
Sep  1 02:03:39 mail     AFSR 0x00000002<CE>.00000068 AFAR 0x000000a1.e36ab060
Sep  1 02:03:39 mail     Fault_PC 0x1036144 Esynd 0x0068 Slot A: J8201
Sep  1 02:03:39 mail SUNW,UltraSPARC-III+: [ID 325890 kern.info] [AFT0] errID 0x
001fa01a.13cecb14 Corrected Memory Error on Slot A: J8201 is Intermittent
……
Sep  2 21:11:21 mail SUNW,UltraSPARC-III+: [ID 472091 kern.info] NOTICE: [AFT0]
Corrected system bus (CE) Event detected by CPU0 at TL=0, errID 0x0000586a.45c0f
244
Sep  2 21:11:21 mail     AFSR 0x00000002<CE>.00000068 AFAR 0x000000a1.e36ab540
Sep  2 21:11:21 mail     Fault_PC <unknown> Esynd 0x0068 Slot A: J8201
Sep  2 21:11:21 mail SUNW,UltraSPARC-III+: [ID 591546 kern.info] [AFT0] errID 0x
0000586a.45c0f244 Corrected Memory Error on Slot A: J8201 is Sticky
Sep  2 21:11:21 mail SUNW,UltraSPARC-III+: [ID 959858 kern.info] [AFT0] errID 0x
0000586a.45c0f244 Data Bit 120 was in error and corrected
Sep  2 21:11:21 mail unix: [ID 752700 kern.warning] WARNING: [AFT0] Sticky Softe
rror encountered on Memory Module Slot A: J8201
Sep  2 21:11:45 mail SUNW,UltraSPARC-III+: [ID 571128 kern.info] NOTICE: [AFT0]
Corrected system bus (CE) Event detected by CPU2 at TL=0, errID 0x0000586f.dd287
c88
Sep  2 21:11:45 mail     AFSR 0x00000002<CE>.00000068 AFAR 0x000000a1.e36ab540
Sep  2 21:11:45 mail     Fault_PC 0x11504fc Esynd 0x0068 Slot A: J8201
Sep  2 21:12:45 mail SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (
0x20) 0x00000000.00000000 0x00000300.0567fd28 ECC 0x05c
Sep  2 21:12:45 mail SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (
0x30) 0x00000000.00000000 0x00000300.0050ff28 ECC 0x028
Sep  2 21:12:45 mail SUNW,UltraSPARC-III+: [ID 422670 kern.info] [AFT2] D$Tag 0x
0a1e36ab D$state Valid D$utag 0xa0 D$snp 0x0a1e36aa
Sep  2 21:12:45 mail SUNW,UltraSPARC-III+: [ID 582021 kern.info] [AFT2] PAtag 0x
0a1.e36ab540 PAsnp 0x0a1.e36ab540 VAutag 0x283540
Sep  2 21:12:45 mail SUNW,UltraSPARC-III+: [ID 842398 kern.info] [AFT2] D$Data (
0x00) 0x00000000.00000000 0x000f0f02.baddcafe
Sep  2 21:12:45 mail SUNW,UltraSPARC-III+: [ID 842398 kern.info] [AFT2] D$Data (
0x10) 0x00000000.00000000 0x00000000.00000000
Sep  2 21:12:45 mail SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data
not available

真的,真的,messages提示错误可以被当作空气,好像不存在?!我觉得公司的人说得不对呢!

论坛徽章:
0
2 [报告]
发表于 2006-09-02 21:24 |只看该作者

呜呜,sos!

顶一下,先。

论坛徽章:
0
3 [报告]
发表于 2006-09-02 21:28 |只看该作者
内存出错了。Slot A: J8201

还是尽快更换吧!

论坛徽章:
0
4 [报告]
发表于 2006-09-02 21:30 |只看该作者
Slot A: J8201 此内存有问题建议更换

[ 本帖最后由 ustcboy 于 2006-9-2 21:32 编辑 ]

论坛徽章:
0
5 [报告]
发表于 2006-09-02 21:51 |只看该作者

谢谢!

我也觉得像是内存问题,可是那个公司的人硬说没问题。Sep  2 21:11:21 mail SUNW,UltraSPARC-III+: [ID 472091 kern.info] NOTICE: [AFT0]
Corrected system bus (CE) Event detected by CPU0 at TL=0, errID 0x0000586a.45c0f
244
这个错误是由于内存地址错,引起的吧?

论坛徽章:
0
6 [报告]
发表于 2006-09-02 22:34 |只看该作者
太能忽悠了,施加点压力吧。

论坛徽章:
1
荣誉会员
日期:2011-11-23 16:44:17
7 [报告]
发表于 2006-09-02 22:43 |只看该作者
什么机器?如果是SUN FIRE系列,在SUN文档816-5053-10(Soft Memory Errors and Their Effect)中有描述:

Handling of Correctable Errors
As discussed earlier, when a processor detects a CE as a result of a read to main
memory, it will correct the incoming data and continue its operation. The error will
be logged in the processor's asynchronous fault status register (AFSR) and the
faulting physical address will be logged in the asynchronous fault address register
(AFAR). The processor will then take a trap so that the error information can be
logged.

Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 796192 kern.notice]
NOTICE: [AFT0]
Corrected system bus (CE) Event on CPU18 at TL=0, errID
0x0000c9b9.19d92690
Oct 25 09:06:25 wpc26 AFSR 0x00000002<CE>.00000097 AFAR
0x00000001.04bdf7d0
Oct 25 09:06:25 wpc26 Fault_PC 0x10024a74 Esynd 0x0097 /N0/SB5/P3/
B0/D2 J16500
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 154767 kern.notice]
[AFT0] errID 0x0000c9b9.19d92690 Corrected Memory Error on /N0/SB5/P3/
B0/D2 J16500 is Persistent
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 682217 kern.notice]
[AFT0] errID 0x0000c9b9.19d92690 Data Bit 3 was in error and corrected
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 422650 kern.info]
[AFT2] errID 0x0000c9b9.19d92690 E$tag PA=0x00000000.00bdf7c0 does not
match AFAR=0x00000001.04bdf7c0
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 904800 kern.info]
[AFT2] errID 0x0000c9b9.19d92690 PA=0x00000000.00bdf7c0
Oct 25 09:06:25 wpc26 E$tag 0x00000000.01000001 E$state_7 Invalid
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 895151 kern.info]
[AFT2] E$Data (0x00) 0x5a8d0016.00000a20 0x20202020.37333231 ECC 0x128
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 895151 kern.info]
[AFT2] E$Data (0x10) 0x39062c00.5a8d0010 0x00000a20.20202020 ECC 0x03d
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 895151 kern.info]
[AFT2] E$Data (0x20) 0x37333330.32062c00 0x5a8f000c.00000a20 ECC 0x1f6
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 895151 kern.info]
[AFT2] E$Data (0x30) 0x20202020.37333330 0x34062c00.5a8f000d ECC 0x1fc
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 929717 kern.info]
[AFT2] D$ data not available
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 335345 kern.info]
[AFT2] I$ data not available

It is important to recognize that all of the above output is the result of one single CE
event. Each of the messages is tagged with an asynchronous fault tag (AFT) to
identify the data being logged. Continuation messages begin with four spaces. The
different AFT tag values are:
. AFT0 is used for correctable errors.
. AFT1 is used for uncorrectable errors as well as for errors that result in a panic.
. AFT2 and AFT3 are used for logging diagnostic data and other error related
messaging.

对比一下上面的例子和LZ的信息,应该是同样的类型。在LZ的信息中,注意:
……
Aug 31 05:36:09 mail SUNW,UltraSPARC-III+: [ID 341960 kern.info] [AFT0] errID 0x
001f5d1d.2bc28304 Corrected Memory Error on Slot A: J8201 is Intermittent
……
Sep  1 00:50:12 mail unix: [ID 752700 kern.warning] WARNING: [AFT0] Sticky Softe
rror encountered on Memory Module Slot A: J8201
Sep  1 00:50:12 mail SUNW,UltraSPARC-III+: [ID 837520 kern.info] NOTICE: [AFT0]
Corrected system bus (CE) Event detected by CPU0 at TL=0, errID 0x001f9c18.29722
b88
……
Sep  1 00:50:12 mail SUNW,UltraSPARC-III+: [ID 871253 kern.info] [AFT0] errID 0x
001f9c18.29722b88 Corrected Memory Error on Slot A: J8201 is Persistent
……
Sep  2 21:11:21 mail unix: [ID 752700 kern.warning] WARNING: [AFT0] Sticky Softe
rror encountered on Memory Module Slot A: J8201
……

对上面的状态,上述文档中这样说:
The Solaris software error handling code provides a disposition code as a result of
the scrub operation. This disposition is one of Intermittent, Persistent, or Sticky. The definition of each of these codes is:
- Intermittent means the error was not detected on a reread of the affected
memory location.
- Persistent means the error was detected again on a reread of the affected
memory location but the scrub operation corrected it.
- Sticky means that the error still exists in memory even after the scrub
operation. These events should be investigated further to determine if some
hardware replacement is necessary since this is indicative of a hard failure.


而LZ的信息中,对J8201的检测状态依次为Intermittent、Sticky、Persistent、Sticky,最后还是Sticky,而且信息的级别也从info变成了WARNING。对这个Sticky状态,文档明确指出,这可能是硬件故障的表现。应进行检测,如确属故障应及时更换:
Errors Categorized as Sticky
If, during the handling of a CE event, the Solaris software cannot correct the flipped
data bit in memory, it will label the event Sticky instead of Persistent. This
disposition is fundamentally different in that it is not consistent with naturally
occurring events. Rather, it is an indication of a hard fault. Any components that
have been diagnosed with a hard fault should be replaced as soon as possible.

论坛徽章:
0
8 [报告]
发表于 2006-09-03 17:26 |只看该作者

谢谢大家!

是SUN FIRE V480的机器。非常感谢大家!

论坛徽章:
15
2015年辞旧岁徽章
日期:2015-03-03 16:54:15双鱼座
日期:2015-01-15 17:29:44午马
日期:2015-01-06 17:06:51子鼠
日期:2014-11-24 10:11:13寅虎
日期:2014-08-18 07:10:55酉鸡
日期:2014-04-02 12:24:51双子座
日期:2014-04-02 12:19:44天秤座
日期:2014-03-17 11:43:36亥猪
日期:2014-03-13 08:13:51未羊
日期:2014-03-11 12:42:03白羊座
日期:2013-11-20 10:15:18CU大牛徽章
日期:2013-04-17 11:48:45
9 [报告]
发表于 2006-09-05 13:55 |只看该作者
说不定内存只是有些松了,但是Unix  的机器通常是见不得这样的错误的, 换了再说。

论坛徽章:
0
10 [报告]
发表于 2006-09-05 16:39 |只看该作者
此人在忽悠你
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP