论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2007-10-06 23:16 |只看该作者 |倒序浏览

HP UNIX:如何判断一个恐慌是由硬件还是软件引起的?
2006-6-8 14:14:51 equalnull 来源：HP 点击:801次加入收藏夹

如何判断一个恐慌是由硬件还是软件引起的?

问题描述

当由于恐慌 (Panic) 而发生重新引导时，其原因可能与硬件相关也可能与
软件/操作系统相关。如果与硬件相关，通过辨识原因为硬件 (如 HPMC) 并
获得相关的相应硬件资源，可以避免花费在分析崩溃转储上的不必要时间。

配置信息

解决方法

要确定恐慌是由硬件还是软件引起的，第一步是检查 shutdownlog 或转储 INDEX
文件中的恐慌消息:

tail /etc/shutdownlog

或者在转储 core.X (10.X) 或 crash.X (11.X) 目录中:

more INDEX

如果 shutdownlog 中与恐慌相对应的条目为:
Reboot after panic: , isr.ior = X’X.Y’Y
请参阅下面的注意 1或Reboot after panic: trap type 1 (HPMC)

或者如果 INDEX 文件中恐慌行为:
panic , isr.ior = X’X.Y’Y
请参阅下面的注意 1
或panic trap type 1 (HPMC)
则可能发生了 HPMC(High Priority Machine Check)，需要打开一个硬件服务呼叫。

注意 1: 如果系统正在运行 MC Serviceguard 或者由于操作员引发了 TOC (Transfer of Control) 导致了系统重新引导，则也会出现这个消息，同时可能需要一个崩溃转储分析。

注意 2: 绝大多数 HPMC 与硬件原因相关，但也有一些例外情况，一旦分析了机箱代码之后，硬件支持就可能要求执行崩溃转储分析。

某些 S800 服务器支持在线收集 HPMC PIM (Processor Internal Memory) 信息，
并且组合了已安装诊断程序和一个名为 pdcinfo 的最新版本实用程序，该实用程序
会将硬件故障信息写入一个名为 /var/tombstones/ts99 的文件中。此信息由
硬件服务部门进行分析。可能需要运行在线诊断程序或者重新引导，以及从引导或
服务菜单获得硬件故障 (机箱) 代码。在 V-class 及 N-class 服务器中，HPMC
和硬件故障信息是使用不同的实用程序获得的。硬件服务有助于这些操作。
.........以下为英文原文 ....

UXDNKBRC00001764
How can I tell if a panic was caused by hardware or software?
Problem Description

When a reboot due to a panic occurs, the cause could be hardware related or
software/OS related. If it is hardware related, unecessary time spent
analyzing a crash dump can be avoided by identifying the cause as hardware(i.e.
HPMC) and getting the appropriate hardware resources involved.

Configuration Info

Solution

The first step in determining if a panic is caused by hardware vs. software is
to check the panic message in the shutdownlog or the dump INDEX file:

tail /etc/shutdownlog

or from the dump core.X (10.X) or crash.X (11.X) directory:

more INDEX

If the entry in the shutdownlog corresponding to the panic is:
Reboot after panic: , isr.ior = X’X.Y’Y see NOTE 1 below
or
Reboot after panic: trap type 1 (HPMC)

or if the panic line from the INDEX file is:
panic , isr.ior = X’X.Y’Y see NOTE 1 below
or
panic trap type 1 (HPMC)

It is likely than an HPMC(High Priority Machine Check) has occurred and a
hardware service call should be opened.

NOTE 1: If the system is running MC Serviceguard or if the system
rebooted due to an operator induced TOC(Transfer of Control)
this message will also appear, and a crash dump analysis may
still be required.

NOTE 2: The vast majority of HPMC’s are related to hardware causes. There
are a few exceptions and once the chassis codes are analyzed,
hardware support may request that a crash dump analysis be performed.

Certain S800 servers support online collection of HPMC PIM(Processor Internal
Memory) information and combined with installed diagnostics and a current
version of a utility called pdcinfo, will write hardware fault
information to a file called: /var/tombstones/ts99. This information
can be analyzed by hardware service. It may be necessary to run online
diagnostics or reboot the system and obtain the hardware fault(chassis) codes
from a boot or service menu. On V-class and N-class servers, HPMC and hardware
fault information is obtained with different utilities. Hardware service can
assist with these operations.

(the end)

返回列表

Chinaunix › 论坛 › 备份版区 › HP-UX › 如何判断一个恐慌是由硬件还是软件引起的?

如何判断一个恐慌是由硬件还是软件引起的? [复制链接]

浏览过的版块