- 论坛徽章:
- 0
|
HP UNIX:如何判断一个恐慌是由硬件还是软件引起的?
2006-6-8 14:14:51 equalnull 来源:HP 点击:801次 加入收藏夹
如何判断一个恐慌是由硬件还是软件引起的?
问题描述
当由于恐慌 (Panic) 而发生重新引导时,其原因可能与硬件相关也可能与
软件/操作系统相关。如果与硬件相关,通过辨识原因为硬件 (如 HPMC) 并
获得相关的相应硬件资源,可以避免花费在分析崩溃转储上的不必要时间。
配置信息
解决方法
要确定恐慌是由硬件还是软件引起的,第一步是检查 shutdownlog 或转储 INDEX
文件中的恐慌消息:
tail /etc/shutdownlog
或者在转储 core.X (10.X) 或 crash.X (11.X) 目录中:
more INDEX
如果 shutdownlog 中与恐慌相对应的条目为:
Reboot after panic: , isr.ior = X’X.Y’Y
请参阅下面的注意 1或Reboot after panic: trap type 1 (HPMC)
或者如果 INDEX 文件中恐慌行为:
panic , isr.ior = X’X.Y’Y
请参阅下面的注意 1
或panic trap type 1 (HPMC)
则可能发生了 HPMC(High Priority Machine Check),需要打开一个硬件服务呼叫。
注意 1: 如果系统正在运行 MC Serviceguard 或者由于操作员引发了 TOC (Transfer of Control) 导致了系统重新引导,则也会出现这个消息,同时可能需要一个崩溃转储分析。
注意 2: 绝大多数 HPMC 与硬件原因相关,但也有一些例外情况,一旦分析了机箱代码之后,硬件支持就可能要求执行崩溃转储分析。
某些 S800 服务器支持在线收集 HPMC PIM (Processor Internal Memory) 信息,
并且组合了已安装诊断程序和一个名为 pdcinfo 的最新版本实用程序,该实用程序
会将硬件故障信息写入一个名为 /var/tombstones/ts99 的文件中。此信息由
硬件服务部门进行分析。可能需要运行在线诊断程序或者重新引导,以及从引导或
服务菜单获得硬件故障 (机箱) 代码。在 V-class 及 N-class 服务器中,HPMC
和硬件故障信息是使用不同的实用程序获得的。硬件服务有助于这些操作。
.........以下为英文原文 ....
UXDNKBRC00001764
How can I tell if a panic was caused by hardware or software?
Problem Description
When a reboot due to a panic occurs, the cause could be hardware related or
software/OS related. If it is hardware related, unecessary time spent
analyzing a crash dump can be avoided by identifying the cause as hardware(i.e.
HPMC) and getting the appropriate hardware resources involved.
Configuration Info
Solution
The first step in determining if a panic is caused by hardware vs. software is
to check the panic message in the shutdownlog or the dump INDEX file:
tail /etc/shutdownlog
or from the dump core.X (10.X) or crash.X (11.X) directory:
more INDEX
If the entry in the shutdownlog corresponding to the panic is:
Reboot after panic: , isr.ior = X’X.Y’Y see NOTE 1 below
or
Reboot after panic: trap type 1 (HPMC)
or if the panic line from the INDEX file is:
panic , isr.ior = X’X.Y’Y see NOTE 1 below
or
panic trap type 1 (HPMC)
It is likely than an HPMC(High Priority Machine Check) has occurred and a
hardware service call should be opened.
NOTE 1: If the system is running MC Serviceguard or if the system
rebooted due to an operator induced TOC(Transfer of Control)
this message will also appear, and a crash dump analysis may
still be required.
NOTE 2: The vast majority of HPMC’s are related to hardware causes. There
are a few exceptions and once the chassis codes are analyzed,
hardware support may request that a crash dump analysis be performed.
Certain S800 servers support online collection of HPMC PIM(Processor Internal
Memory) information and combined with installed diagnostics and a current
version of a utility called pdcinfo, will write hardware fault
information to a file called: /var/tombstones/ts99. This information
can be analyzed by hardware service. It may be necessary to run online
diagnostics or reboot the system and obtain the hardware fault(chassis) codes
from a boot or service menu. On V-class and N-class servers, HPMC and hardware
fault information is obtained with different utilities. Hardware service can
assist with these operations.
(the end) |
|