- 论坛徽章:
- 0
|
Morning ,a colleague named Golden went to our Data Centre in U.S.
He installed 2 cpu and 16G memory on our DB server.
但是我发现服务器有一下报错:
Sep 6 19:56:59 Stlmnt02-SJC1 fmd: [ID 441519 daemon.error] SUNW-MSG-ID: SUN4U-8000-35, TYPE: Fault, VER: 1, SEVERITY: Minor
Sep 6 19:56:59 Stlmnt02-SJC1 EVENT-TIME: Thu Sep 6 19:56:59 PDT 2007
Sep 6 19:56:59 Stlmnt02-SJC1 PLATFORM: SUNW,Sun-Fire-V440, CSN: -, HOSTNAME: Stlmnt02-SJC1
Sep 6 19:56:59 Stlmnt02-SJC1 SOURCE: cpumem-diagnosis, REV: 1.5
Sep 6 19:56:59 Stlmnt02-SJC1 EVENT-ID: 54725d22-6819-ed95-b25c-db319d5d62f9
Sep 6 19:56:59 Stlmnt02-SJC1 DESC: The number of errors associated with this memory module has exceeded acceptable levels. Refer to
http://sun.com/msg/SUN4U-8000-35
for more information.
Sep 6 19:56:59 Stlmnt02-SJC1 AUTO-RESPONSE: Pages of memory associated with this memory module are being removed from service as errors are reported.
Sep 6 19:56:59 Stlmnt02-SJC1 IMPACT: Total system memory capacity will be reduced as pages are retired.
Sep 6 19:56:59 Stlmnt02-SJC1 REC-ACTION: Schedule a repair procedure to replace the affected memory module. Use fmdump -v -u to identify the module.
下面是我到
http://sun.com/msg/SUN4U-8000-35
所学习到的。
Memory module errors exceeded acceptable levels
Type
Fault
Severity
Critical
Description
The Solaris Fault Manager has determined that one or more uncorrectable (multibit) memory errors indicating a fault which requires repair action is present.
Automated Response
The system will attempt to remove the affected physical memory page from service after restart of one or more specific services, or the entire system.
Impact
This error will cause either a system panic or restart of one or more user processes, with resulting interruption in service.
After restart, the performance of the system may be minimally impacted as a result of removing the physical memory page from operation.
Suggested Action for System Administrator
Schedule a repair procedure to replace the affected memory DIMM module,whose identity can be determined using fmdump -v -u
For example:
EVENT-ID: d05a9f16-e969-4988-d340-dea1b54bd307
Details
The Message ID: SUN4U-8000-35 that the Solaris Fault Manager has received reports that one or more multibit, uncorrectable memory errors have been detected. Diagnosis has determined that a fault requiring repair action is present.
If there are no indications of a system board or platform problem (See Note, below) Schedule replacement of the affected DIMMs as identified in the fmdump(1M) output. % fmdump -vU d05a9f16-e969-4988-d340-dea1b54bd307
TIME UUID SUNW-MSG-ID
Feb 10 23:34:29.0307 d05a9f16-e969-4988-d340-dea1b54bd307 SUN4U-8000-35
95% fault.memory.bank
FRU: mem:///component=J0100,J0202,J0304,J0406
rsrc: mem:///component=J0100,J0202,J0304,J0406
Once the DIMMs have been replaced, the command fmadm faulty should be run to see the status of the memory involved.
Example:
# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
degraded mem:///component=J0100,J0202,J0304,J0406
d05a9f16-e969-4988-d340-dea1b54bd307
-------- ----------------------------------------------------------------------
Then fmadm repair should be run to remove the memory from the faulty list. # fmadm repair mem:///component=J0100,J0202,J0304,J0406
fmadm: recorded repair to mem:///component=J0100,J0202,J0304,J0406
See the fmadm man page for specific syntax of the command.
Sun Fire Midrange and High End notes:
There may be additional fault information logged by System Controller Application (ScApp) on Sun Fire[TM] Midrange (3800, 48x0, 6800, v1280, E2900, E4900, E6900) and System Management Services (SMS) on High End (12K, 15K, E20K, E25K) Servers. Availability enhancements (called AVL) have been integrated into ScApp 5.19.0 and SMS 1.5 to allow PSH/AVL interaction to coexist.
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/29885/showart_376461.html |
|