- 论坛徽章:
- 0
|
Sun System Handbook - CD 2.1.17 September 2008 Internal/Partner Edition
Home | Current Systems | EOL Systems | Components | General Info | Search | Feedback
Asset ID: 1-61-211473-1
Update Date: Tue Aug 12 00:00:00 MDT 2008
Keywords:
Solution Type Technical Instruction
Solution 211473 : How to Verify whether a System Reboot is Caused by a Fatal Reset or a Red State Exception
Related Categories
Home>Product>Systems>Servers
Description
This document will help identify if the reason for an unexpected or unexplained system reboot is due to a Fatal reset error or a Red State Exception (RSE) condition.
Please note that the purpose of this document is to help you with the root cause. In case the symptoms described in this document, are indeed what your system is experiencing, you will need to make a contact with qualified engineers at Sun Support. Please reference this document ID number once you are ready to make contact with Sun Support for assistance.
Steps to Follow
The unexpected reboots are most often caused by hardware faults and reported by the system as a fatal reset or a red state exception.
When errors like these occur, the OS is abruptly interrupted and can't continue to log error messages in /var/adm/messages or generate a core file. As a result, the system reboots but the error messages and all output will only appear on the system console (will be in console logs). So in order to do further troubleshooting, it is very important to gather the complete console logs at the time of the error (reboot).
1. The system reboot could be due to fatal reset errors. The fatal errors are most often caused by hardware (bad CPU, MB switches, I/O bridge, etc.) and are the result of an 'illegal' hardware state that is detected by the system. The Fatal Reset error and all output are only logged to the system console (ttya or RSC). Here are examples of fatal errors caused by CPU and motherboard switch ASICs (the full fatal reset output is too long and is not included):
ERROR: System Hardware FATAL RESET from CPU0
System State (CPU3 reporting)
ERROR: System "FATAL RESET" from DAR/DCS/CDX
System State (CPU2 reporting)
For systems using ALOM serial console the fatal error would be reported as:
Fatal Error Reset
SC Alert: Host System has Reset
When your system reboots after fatal error, you will may also see ONLY a notice in the /var/adm/messages file like this one:
[ID 796976 kern.notice] System booting after fatal error FATAL Sys Hardware
Also, the prtconf -vp may show Fatal Sys Hardware message under " reset-reason: "
# prtconf -vp
System Configuration: Sun Microsystems sun4u
Memory size: 8192 Megabytes
System Peripherals (PROM Nodes):
.....................
banner-name: 'Sun Fire 880'
watchdog-enable:
reset-reason: 'FATAL Sys Hardware' <<<<<<<
model: 'SUNW,501-6323'
In case the console logs have fatal errors. If your system is experiencing these errors, please contact a qualified engineer at Sun Support for assistance.
1.a) For the UltraSPARC III/IV platforms (280R, V480/V880, V490/V890) and UltraSPARC IIIi platforms (V210/V240, V440) a trained Sun Support Engineer has access to important information along with an AFAR decoder tool and will carefully guide you through the steps to resolution.
Sun Support can also assist you if you are experiencing V480 Fatal Resets with specific network and I/O configurations.
2. The unexpected reboot could also be due to Red State Exception (RSE) errors. The user needs to verify if the console output has any Red State Exception (RSE) errors. The RSE can be triggered by both Software and/or Hardware, but this condition is most commonly due to a hardware fault (bad DIMM or bad CPU/ L2SRAM). The RSE error and all output are only logged to the system console (ttya or RSC) and usually is reported by one of the CPUs:
ERROR: CPU3 RED State Exception
System State (CPU3 reporting)
If your system does reboot after RSE, you may also see ONLY a notice in the /var/adm/messages file like this one:
[ID 993603 kern.notice] System booting after RED CPU RED-State
The prtconf -vp may show RED CPU RED-State message under " reset-reason: "
#prtconf -vp
System Configuration: Sun Microsystems sun4u
Memory size: 32768 Megabytes
System Peripherals (PROM Nodes):
banner-name: 'Sun Fire 880'
watchdog-enable:
reset-reason: 'RED CPU RED-State' <--- reset-reason
In case the console logs have RSE errors, once again, this is a critical issue where you will need a qualified Sun Support Engineer to assist you, so please contact a qualified engineer at Sun Support for assistance.:
2.a) for the UltraSPARC III/IV platforms (280R, V480/V880, V490/V890) and UltraSPARC IIIi platforms (V210/V240, V440) please contact Sun Support for assistance.
Internal Comments
Internal comments:
This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the "Document Feedback" alias(es) listed below:
Normalization Lead: Jim Robbins Domain Engineer/Lead : Josh Freeman
VSP-SPARC-Normalization@sun.com
REFERENCES:
In case the console logs have fatal errors, reference the following docs:
1.a) for the UltraSPARC III/IV platforms (280R, V480/V880, V490/V890) refer to: Troubleshooting < Solution: 209123 > : Sun Fire V880 FATAL Resets.
< Solution: 205066 > : V480 Fatal Resets with specific network and I/O configurations.
Note: The procedures < Solution: 209123 > apply to all V4x0/V8x0 platforms, since they are using the same CPU/memory board.
1.b) for the UltraSPARC IIIi platforms (V210/V240, V440) you may use US3i AFAR decoder tool in conjunction with < Solution: 206870 > : Event Messages for UltraSPARC-III[R], UltraSPARC-III+[R], UltraSPARC-IIIi[R], UltraSPARC-IV[R] and UltraSPARC-IV+[R] CPU Modules .
In case the console logs have RSE errors, reference the following docs:
2.a) for the UltraSPARC III/IV platforms (280R, V480/V880, V490/V890) refer to:
< Solution: 209130 > : Troubleshooting Sun Fire V880 RED STATE EXCEPTION .
< Solution: 216842 > : Troubleshooting Red State Exception Memory Errors .
2.b) for the UltraSPARC IIIi platforms (V210/V240, V440) you may use US3i AFAR decoder tool in conjunction with < Solution: 206870 > : Event Messages for UltraSPARC-III[R], UltraSPARC-III+[R], UltraSPARC-IIIi[R], UltraSPARC-IV[R] and UltraSPARC-IV+[R] CPU Modules .
More Reference Material:
Internal Tool: Fatal Reset Decoder
Internal Tool: RED State Exception Decoder
Internal Tool: US3iAFAR Decoder
Sun Alert < Solution: 200502 > Sun Systems Equipped ASICs Version 2.3 or Higher May Experience Either Domain Stop (Dstop), Domain Pause or FATAL RESET Under Heavy I/O
FCO AO226-1 Click Here V480 Fatal Resets with specific network and I/O configurations
Sun Alert < Solution: 201170 > Sun Fire V440 and Netra 440 Systems Using a Specific Networking Configuration may Unexpectedly Reset
Troubleshooting < Solution: 216842 > Troubleshooting Red State Exception Memory Errors
Troubleshooting < Solution: 209123 > Sun Fire V880 FATAL Resets
Troubleshooting < Solution: 209130 > Troubleshooting Sun Fire V880 RED STATE EXCEPTION
< Solution: 206870 > : Event Messages for UltraSPARC-III[R], UltraSPARC-III+[R], UltraSPARC-IIIi[R], UltraSPARC-IV[R] and UltraSPARC-IV+[R] CPU Modules .
Product
Sun Fire V890 Server
Sun Fire V880z Visualization Server
Sun Fire V880 Server
Sun Fire V490 Server
Sun Fire V480 Server
Sun Fire V445 Server
Sun Fire V440 Server
Sun Fire V240 Server
Sun Fire V210 Server
Keywords
normalized, unexplained reboot, console logs, red state, fatal reset, Problem Solved = Identify Fatal Reset or Red State
Previously Published As
91380
Change History
emailed Author Dencho Kojucharov 8/11/08
I was working to publish this article but ran across 3 links in your Internal Comments Statement that were not found. They are listed below:
Sun Alert 101456
FCO AO226-1
Sun Alert 101548
If you could fix these links as soon as possible so that we may publish it would be greatly appreciated. if you have any questions please let me know.
Thank you
Date: 2008-01-08
User Name: 7058
Action: Update Started
Comment: Updating doc per Jim Koontz and Dencho's approval to make it more suitable for customer viewing.
Version: 0
Attachments
This solution has no attachment
Copyright 1994-2008 Sun Microsystems, Inc. All rights reserved.
Legal Terms Privacy Policy Feedback |
|