免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1029 | 回复: 0
打印 上一主题 下一主题

HP-UX 11.00 System Panic (White Paper) from HP [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2004-03-30 08:52 |只看该作者 |倒序浏览
HP-UX 11.0 System Panics White Paper
HP 9000 Series 700/800 Computers
September 1997, Third Edition


LEGAL NOTICES
The information in this document is subject to change without notice.

Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.

Warranty. A copy of the specific warranty terms applicable to your Hewlett-Packard product and replacement parts can be obtained from your local Sales and Service Office.

Restricted Rights Legend. Use, duplication, or disclosure by the U.S. Government Department is subject to restrictions as set forth in subparagraph (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 for DOD agencies, and subparagraphs (c) (1) and (c) (2) of the Commercial Computer Software Restricted Rights clause at FAR 52.227-19 for other agencies.

HEWLETT-PACKARD COMPANY
3000 Hanover Street
Palo Alto, California 94304 U.S.A.
Use of this manual and flexible disk(s) or tape cartridge(s) supplied for this pack is restricted to this product only. Additional copies of the programs may be made for security and back-up purposes only. Resale of the programs in their present form or with alterations, is expressly prohibited.

Copyright Notices. (C)copyright 1983-95 Hewlett-Packard Company, all rights reserved.

Reproduction, adaptation, or translation of this document without prior written permission is prohibited, except as allowed under the copyright laws.

(C)copyright 1979, 1980, 1983, 1985-93 Regents of the University of California

This software is based in part on the Fourth Berkeley Software Distribution under license from the Regents of the University of California.

(C)copyright 1980, 1984, 1986 Novell, Inc.

(C)copyright 1986-1992 Sun Microsystems, Inc.

(C)copyright 1985-86, 1988 Massachusetts Institute of Technology.

(C)copyright 1989-93 The Open Software Foundation, Inc.

(C)copyright 1986 Digital Equipment Corporation.

(C)copyright 1990 Motorola, Inc.

(C)copyright 1990, 1991, 1992 Cornell University

(C)copyright 1989-1991 The University of Maryland.

(C)copyright 1988 Carnegie Mellon University.

Trademark Notices. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Limited.

X Window System is a trademark of the Massachusetts Institute of Technology.

MS-DOS and Microsoft are U.S. registered trademarks of Microsoft Corporation.

OSF/Motif is a trademark of the Open Software Foundation, Inc. in the U.S. and other countries.

First Edition: April 1995 (HP-UX Release 10.0)
Second Edition: March 1997 (HP-UX Releases 10.10 - 10.30)
Third Edition: September 1997 (HP-UX Release 11.0)


--------------------------------------------------------------------------------


HP-UX 11.0 System Panics


--------------------------------------------------------------------------------


System Panics: What They Are And Why They Happen
The term "panic" is, by definition, frightening! To see a message displayed on your system console that HP-UX has panicked can be alarming. But it is not necessary to panic when your system does. In HP-UX terms, a panic simply means that HP-UX ran into a condition that it did not know how to respond to, so it halted your computer.

System panics are rare and not always the result of a catastrophe. They sometimes occur on boot up if your system was previously not shut down properly. Sometimes they occur as the result of a hardware failure.

Recovering from a system panic can be as simple as rebooting your system; in fact, in many instances the system automatically reboots. If you have an up-to-date set of file system backup tapes, the worst case scenario would involve reinstalling HP-UX and restoring any files that were lost or corrupted or recovering the system from the System Recovery tape created using the make_recovery command. If this situation was caused by a hardware failure such as a disk head crash, you will, of course, have to have the hardware fixed before you can perform the reinstallation.

NOTE: It is important to maintain an up-to-date System Recovery tape or backup of the files on your system so that, in the event of a disk head crash or similar situation, you can recover your data. How frequently you update these backups depends on how much data you can afford to lose. For information on how to back up data, refer to Managing Systems and Workgroups, part #B2355-90157.

You may also want to consider purchasing Hewlett-Packard's High Availability products, such as MC/ServiceGuard, MC/LockManager, MirrorDisk/UX, etc. These layered software and hardware products can prevent system panics or down systems in many situations. Contact your Hewlett-Packard Sales Representative for more information.

What to Do When Your System Panics
When HP-UX panics, it will display a "panic message" on the system console. When this happens, take the following steps.

Step 1: Record the panic message displayed on the system console. If convenient, it's a good idea to record all of the messages displayed on the console when the system panics. "The" panic message, as referred to here, is the message on the line that starts with "panic:".
Step 2: Categorize the panic message. The panic message will tell you why HP-UX panicked. Sometimes panic messages refer to internal structures of HP-UX (or its file systems) and the cause might not be obvious. Generally, the problem is in one of the following areas, and wording of the message should allow you to classify it into one of them:

Category
Proceed to Step #

Hardware Failure
Step 3a

File system Problem (corrupted?)
Step 3b

LAN communication Problem
Step 3c

LVM-related Problem
Step 3d

None of the above
Step 3e



Step 3a: Hardware Failure Recovery
If the panic message indicated a hardware failure, the text or context of the message should indicate what piece of hardware failed.

Record any error messages associated with the failure. If the hardware failure appears to be associated with a peripheral, check to be sure that its cables are tightly connected to their proper locations and that the device is powered on and in an "online" status.

CAUTION: In the case of SCSI devices, you should not connect or disconnect cables, or power off or on devices, while the HP 9000 computer is powered on, since doing so could lead to corruption of disk data. Tightening a loose cable could have the same effect as powering on a peripheral. In this situation, first turn off the computer and then check the cables.

If there is an error indicated on the device's display:

record the error message or display in your log book
turn the device off
if the device is a disk drive, wait for it to stop spinning
turn the device back on
If the problem reappears on the device or if the hardware failure appears to be associated with an interface card or an internal component of the System Processing Unit, it might be necessary to have the problem fixed by Hewlett-Packard or whoever performs your hardware maintenance.

Proceed to Step 4 (rebooting your system).

Step 3b: File system problem recovery:
If the panic message indicates a problem with one of your file systems, you will need to run the file system checker fsck(1m) to check and correct the problem(s). This is normally done automatically at boot time so you should proceed to step 4 (rebooting your system). Follow all directions that fsck gives you. When your root file system (the one with the "/" directory) has problems, fsck will tell you to use the "-n" option to the reboot(1m) command, right after fsck completes; it is especially important to follow this instruction. (See Step 4.)

Step 3c: LAN communication problem
If the panic messages indicates a problem with LAN communication, check all LAN cable connections to be sure of the following:

All connectors are tightly fastened to the LAN cable and the media access units (MAU's). If you are using "thick LAN", make sure all vampire taps are tightly connected to their respective cables and that AUI cables are connected securely to the LAN interface cards (LANICs) in your computer.
Your LAN is properly terminated. Each END of the LAN cable MUST have a 50 ohm terminator on it. Do NOT connect a computer directly to the END of a LAN cable.
Proceed to Step 4 (rebooting your system).

Step 3d: LVM-related Problem
If you reduce the size of a logical volume that contains a file system such that the logical volume is smaller than the file system within it, you will corrupt the file system. This will often manifest itself by causing your system to panic when you attempt to access a part of the truncated file system that is beyond the new boundary of the logical volume.

The problem might not show up immediately. This will occur when the truncated part of the file system is overwritten by something else (such as a new logical volume, or the extension of a logical volume in the same volume group as the truncated file system).

For more information on how to recover from this problem, refer to Managing Systems and Workgroups, part #B2355-90157.

Step 3e: Recovery from other situations
When you suspect the problem was something other than the above (or when you do not know where to classify it), proceed to step 4 (Rebooting your system). Many times, that's all that's required to recover from a system panic and it's certainly worth a try. In this case, it is especially important that you write down the exact text of the panic message, just in case you need it for future troubleshooting.

Step 4: Rebooting your system
Once you have checked for and corrected any problems from Step 3, you are ready to reboot your system. If your system has a "reset" switch or button, you can reboot your system using that. Otherwise, turn your computer off and then back on to initiate the boot up sequence.

You will probably notice a few differences in the boot up displays/activities as compared with your normal boot up sequence. Your computer might save a "crash dump" to disk. (See the following discussion on "If You Want to Save Crash Dumps After a System Panic".) This crash dump is a "snapshot" of the previously running kernel at the time that it panicked. If it becomes necessary, this crash dump can be analyzed using special tools to determine more about what caused the panic.

If the reason your system panicked was because of a corrupted file system, fsck will report the errors and any corrections it makes. If fsck terminates and requests to be run manually, refer to Managing Systems and Workgroups, part #B2355-90157 for further instructions. If the problems were associated with your root file system, fsck will ask you to reboot your system when it's finished.

When you do this, use the command:

reboot -n
The -n option tells reboot not to sync the file system before rebooting. Since fsck has made all the corrections on disk, you do not want to undo the changes by writing over them with the still corrupt memory buffers.

Step 5: Monitor the system closely for a while
If your system successfully boots, there is a good chance that you can resume normal operations. Many system panics are isolated events, unlikely to reoccur.

Check your applications to be sure that they are running properly and (for a day or so) monitor the system closely. For a short while, you might want to do backups more frequently until you are confident that the system is functioning properly.

If You Want to Save Crash Dumps After a System Panic
System crash dumps produced by system panics can be very large, but can be useful for debugging difficult problems. System crash dumps are saved by default after a system panic. If you want to disabled them, or change how and where they are saved, you need to edit the startup configuration script /etc/rc.config.d/savecrash as follows:

If you do not want system crash dumps to be saved, set the variable SAVECRASH to 0. The value 1 specifies a crash dump to disk. (The default value is 1 which specifies that crash dumpss are saved after a system panic.)
Set the variable SAVECRASH_DIR to the directory where the crash dump will be saved. Because crash dumps can be very large, it is best to specify a directory in a filesystem with lots of available space. (The default value for SAVECRASH_DIR is /var/adm/crash.)
Set the variable SAVE_PART to 1 if you have set up dedicated dump devices (physical devices reserved for dumps and not used for file systems, LVM physical disks or swapping) and wish to leave the crash dump on this device. This uses minimal file system space to write an INDEX file (for other commands to find the dump) and copy the kernel module files (usually /stand/vmunix).
Since system crash dumps can be very large, it is best to save them to tape using the tar command and remove them from your file system in order to free up space. If you know why your system panicked, you can delete the crash dumps; it is unnecessary to keep them. The crash dumps are used in rare circumstances to diagnose hard-to-find causes of system panics.
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP