免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 11034 | 回复: 10
打印 上一主题 下一主题

Solaris fmadm的错误信息,帮忙看看 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2011-07-04 20:32 |只看该作者 |倒序浏览
SunOS 5.10
Sun Microsystems  sun4u Sun Fire E6900
fmadm 一直在报错, 是cpu/mem的报错,请帮忙分析一下,谢谢
machine # fmdump
TIME                 UUID                                 SUNW-MSG-ID
Jun 30 09:10:55.2888 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a FMD-8000-2K
Jul 04 13:37:53.4181 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a FMD-8000-4M Repaired
Jul 04 13:42:19.4675 4bfd1be3-4066-67de-a513-e36e5fa70848 FMD-8000-0W
Jul 04 13:52:19.7609 4bfd1be3-4066-67de-a513-e36e5fa70848 FMD-8000-4M Repaired
machine # fmdump -v
TIME                 UUID                                 SUNW-MSG-ID
Jun 30 09:10:55.2888 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a FMD-8000-2K
  100%  defect.sunos.fmd.module

        Problem in: fmd:///module/cpumem-diagnosis
           Affects: fmd:///module/cpumem-diagnosis
               FRU: -
          Location: -

Jul 04 13:37:53.4181 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a FMD-8000-4M Repaired
  100%  defect.sunos.fmd.module

        Problem in: fmd:///module/cpumem-diagnosis
           Affects: fmd:///module/cpumem-diagnosis
               FRU: -
          Location: -

Jul 04 13:42:19.4675 4bfd1be3-4066-67de-a513-e36e5fa70848 FMD-8000-0W
  100%  defect.sunos.fmd.nosub

        Problem in: -
           Affects: -
               FRU: -
          Location: -

Jul 04 13:52:19.7609 4bfd1be3-4066-67de-a513-e36e5fa70848 FMD-8000-4M Repaired
  100%  defect.sunos.fmd.nosub

        Problem in: -
           Affects: -
               FRU: -
          Location: -

machine # fmdump -e | head
TIME                 CLASS
Jul 04 16:30:07.0788 ereport.cpu.ultraSPARC-IVplus.ce
Jul 04 16:30:07.2688 ereport.cpu.ultraSPARC-IVplus.ce
Jul 04 16:30:07.3489 ereport.cpu.ultraSPARC-IVplus.ce
Jul 04 16:30:07.3588 ereport.cpu.ultraSPARC-IVplus.ce
Jul 04 16:30:07.2388 ereport.cpu.ultraSPARC-IVplus.ce
Jul 04 16:30:07.3888 ereport.cpu.ultraSPARC-IVplus.ce
Jul 04 16:30:07.2988 ereport.cpu.ultraSPARC-IVplus.ce
Jul 04 16:30:07.3788 ereport.cpu.ultraSPARC-IVplus.ce
Jul 04 16:30:07.5989 ereport.cpu.ultraSPARC-IVplus.ce

论坛徽章:
0
2 [报告]
发表于 2011-07-04 20:33 |只看该作者
/var/fm/fmdx下面的errlog一直在增长,我想用svcadm把fmd禁掉,但是又害怕有什么问题,请高手帮忙看看

论坛徽章:
0
3 [报告]
发表于 2011-07-04 22:15 |只看该作者
补丁是最新的吗?

Also can refer to this:
http://jacksonjstrong.blogspot.c ... -management_27.html

论坛徽章:
0
4 [报告]
发表于 2011-07-05 10:06 |只看该作者
patchadd -p 最下面的是这个
Patch: 120288-03 Obsoletes: Requires: Incompatibles: Packages: SUNWgnome-terminal SUNWgnome-terminal-root
Patch: 120286-02 Obsoletes: Requires: 119368-04 Incompatibles: Packages: SUNWgnome-text-editor-root SUNWgnome-text-editor SUNWgnome-text-editor-devel
Patch: 128318-01 Obsoletes: 128253-01 Requires: Incompatibles: Packages: SUNWsshcu
Patch: 126630-01 Obsoletes: Requires: Incompatibles: Packages: SUNWtcsh

论坛徽章:
0
5 [报告]
发表于 2011-07-05 10:30 |只看该作者
fmdump -u UUID

fmdump -uv UUID

论坛徽章:
0
6 [报告]
发表于 2011-07-05 15:26 |只看该作者
messages 信息可以发下吗

论坛徽章:
0
7 [报告]
发表于 2011-07-05 18:46 |只看该作者
本帖最后由 ramonlc 于 2011-07-05 18:47 编辑

fmdump 显示不了,是不是我repaire的原因
TIME                 UUID                                 SUNW-MSG-ID
root@MACHINE2 # fmdump
TIME                 UUID                                 SUNW-MSG-ID
Jun 30 09:10:55.2888 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a FMD-8000-2K
Jul 04 13:37:53.4181 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a FMD-8000-4M Repaired
Jul 04 13:42:19.4675 4bfd1be3-4066-67de-a513-e36e5fa70848 FMD-8000-0W
Jul 04 13:52:19.7609 4bfd1be3-4066-67de-a513-e36e5fa70848 FMD-8000-4M Repaired
Jul 05 16:55:27.5737 9d437cce-acaa-4b8e-a491-e61a635c6220 FMD-8000-2K
Jul 05 18:24:39.8892 1648611d-cb03-45c6-a445-91496fce6893 FMD-8000-0W
Jul 05 18:26:53.1216 1648611d-cb03-45c6-a445-91496fce6893 FMD-8000-4M Repaired
root@MACHINE2 # fmdump -u 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a
TIME                 UUID                                 SUNW-MSG-ID
Jun 30 09:10:55.2888 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a FMD-8000-2K
Jul 04 13:37:53.4181 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a FMD-8000-4M Repaired
root@MACHINE2 # fmdump -uv 1648611d-cb03-45c6-a445-91496fce6893
fmdump: failed to open 1648611d-cb03-45c6-a445-91496fce6893: No such file or directory
root@MACHINE2 # fmdump -uv 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a
fmdump: failed to open 25c5d1b1-66c7-ecf4-af8a-9455ab674e5a: No such file or directory
root@MACHINE2 # fmdump -uv 4bfd1be3-4066-67de-a513-e36e5fa70848
fmdump: failed to open 4bfd1be3-4066-67de-a513-e36e5fa70848: No such file or directory
root@MACHINE2 # fmdump -uv 9d437cce-acaa-4b8e-a491-e61a635c6220
fmdump: failed to open 9d437cce-acaa-4b8e-a491-e61a635c6220: No such file or directory
root@MACHINE2 # fmdump -uv 1648611d-cb03-45c6-a445-91496fce6893
fmdump: failed to open 1648611d-cb03-45c6-a445-91496fce6893: No such file or directory

dmesg | tail -100
Jul  5 18:32:48 MACHINE2 last message repeated 1 time
Jul  5 18:32:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:32:51 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:33:16 MACHINE2 last message repeated 9 times
Jul  5 18:33:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:33:19 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:33:46 MACHINE2 last message repeated 10 times
Jul  5 18:33:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:33:49 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:34:18 MACHINE2 last message repeated 10 times
Jul  5 18:34:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:34:20 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:34:47 MACHINE2 last message repeated 9 times
Jul  5 18:34:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:34:50 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:35:16 MACHINE2 last message repeated 9 times
Jul  5 18:35:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:35:19 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:35:45 MACHINE2 last message repeated 9 times
Jul  5 18:35:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:35:48 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:36:15 MACHINE2 last message repeated 9 times
Jul  5 18:36:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:36:19 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:36:46 MACHINE2 last message repeated 9 times
Jul  5 18:36:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:36:49 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:37:17 MACHINE2 last message repeated 9 times
Jul  5 18:37:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:37:20 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:37:47 MACHINE2 last message repeated 9 times
Jul  5 18:37:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:37:50 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:38:18 MACHINE2 last message repeated 10 times
Jul  5 18:38:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:38:21 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:38:48 MACHINE2 last message repeated 10 times
Jul  5 18:38:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:38:51 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:39:17 MACHINE2 last message repeated 9 times
Jul  5 18:39:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:39:20 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:39:23 MACHINE2 last message repeated 1 time
Jul  5 18:39:25 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:39:46 MACHINE2 last message repeated 7 times
Jul  5 18:39:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:39:49 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:40:18 MACHINE2 last message repeated 10 times
Jul  5 18:40:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:40:21 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:40:46 MACHINE2 last message repeated 9 times
Jul  5 18:40:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:40:49 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:41:17 MACHINE2 last message repeated 10 times
Jul  5 18:41:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:41:20 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:41:48 MACHINE2 last message repeated 10 times
Jul  5 18:41:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:41:51 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:42:16 MACHINE2 last message repeated 9 times
Jul  5 18:42:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:42:19 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:42:47 MACHINE2 last message repeated 10 times
Jul  5 18:42:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:42:50 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:43:18 MACHINE2 last message repeated 10 times
Jul  5 18:43:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:43:21 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:43:47 MACHINE2 last message repeated 9 times
Jul  5 18:43:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:43:50 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:44:18 MACHINE2 last message repeated 10 times
Jul  5 18:44:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:44:20 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:44:47 MACHINE2 last message repeated 10 times
Jul  5 18:44:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:44:50 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:45:17 MACHINE2 last message repeated 10 times
Jul  5 18:45:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:45:20 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:45:47 MACHINE2 last message repeated 10 times
Jul  5 18:45:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:45:50 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:46:01 MACHINE2 last message repeated 4 times
Jul  5 18:46:04 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:46:18 MACHINE2 last message repeated 5 times
Jul  5 18:46:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:46:21 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:46:47 MACHINE2 last message repeated 9 times
Jul  5 18:46:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:46:50 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:47:16 MACHINE2 last message repeated 9 times
Jul  5 18:47:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:47:19 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:47:48 MACHINE2 last message repeated 10 times
Jul  5 18:47:48 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:47:52 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller
Jul  5 18:48:18 MACHINE2 last message repeated 9 times
Jul  5 18:48:18 MACHINE2 sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
Jul  5 18:48:21 MACHINE2 sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller

论坛徽章:
0
8 [报告]
发表于 2011-07-05 20:38 |只看该作者
本帖最后由 zyl555 于 2011-07-05 20:41 编辑

sc下面看看showerror就知道了。
这种问题两个可能,1是bug,2是内存故障了。
fmadm faulty应该能看到是内存报错。

第二种可能性大点

论坛徽章:
0
9 [报告]
发表于 2011-07-06 09:34 |只看该作者
我看到是内存报错了,但是“Total system memory capacity will be reduced as pages are retired“

但是我用prtdiag 和top 察看内存没有变小
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Jun 29 23:34:26 78324ec7-03d3-ca0e-f46b-899d207e1e41  SUN4U-8000-2S  Major   

Fault class : fault.memory.dimm 95%
Affects     : mem:///unum=/N0/SB4/P1/B1/D3,J14601
                  degraded but still in service
FRU         : mem:///unum=/N0/SB4/P1/B1/D3,J14601 95%
                  faulty
Serial ID.  : 5017385A45375

Description : The number of errors associated with this memory module has
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/SUN4U-8000-2S for more information.

Response    : Pages of memory associated with this memory module are being
              removed from service as errors are reported.

Impact      : Total system memory capacity will be reduced as pages are
              retired.

Action      : Schedule a repair procedure to replace the affected memory
              module. Use fmdump -v -u <EVENT_ID> to identify the module.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Jul 05 16:55:25 9d437cce-acaa-4b8e-a491-e61a635c6220  FMD-8000-2K    Minor   

Fault class : defect.sunos.fmd.module
Affects     : fmd:///module/cpumem-diagnosis
                  degraded but still in service

Description : A Solaris Fault Manager component has experienced an error that
              required the module to be disabled.  Refer to
              http://sun.com/msg/FMD-8000-2K for more information.

Response    : The module has been disabled.  Events destined for the module
              will be saved for manual diagnosis.

Impact      : Automated diagnosis and response for subsequent events associated
              with this module will not occur.

Action      : Use fmdump -v -u <EVENT-ID> to locate the module.  Use fmadm
              reset <module> to reset the module.

论坛徽章:
0
10 [报告]
发表于 2011-07-06 09:35 |只看该作者
回复 8# zyl555


    sc下面看看showerror就知道了?
   这个如何操作?
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP