- 论坛徽章:
- 0
|
求救,内核挂了。重起又出现这类错误……
服务器内核因出现BUG,导致系统瘫痪。以下为内核BUG日志。
Jul 19 09:59:14 host kernel: Assertion failure in journal_commit_transaction() at commit.c:535: "buffer_jdirty(bh)"
Jul 19 09:59:14 host kernel: ------------[ cut here ]------------
Jul 19 09:59:14 host kernel: kernel BUG at commit.c:535!
Jul 19 09:59:14 host kernel: invalid operand: 0000
Jul 19 09:59:14 host kernel: eepro100 usb-uhci usbcore ext3 jbd sym53c8xx sd_mod scsi_mod
Jul 19 09:59:14 host kernel: CPU: 3
Jul 19 09:59:14 host kernel: EIP: 0010:[<88480e4>;] Not tainted
Jul 19 09:59:14 host kernel: EFLAGS: 00010286
Jul 19 09:59:14 host kernel:
Jul 19 09:59:14 host kernel: EIP is at journal_commit_transaction [jbd] 0xb04 (2.4.18-3smp)
Jul 19 09:59:14 host kernel: eax: 0000001c ebx: 0000000a ecx: c02eee60 edx: 00024b3b
Jul 19 09:59:14 host kernel: esi: f6b6ceb0 edi: f7844ba0 ebp: f782e000 esp: f782fe78
Jul 19 09:59:14 host kernel: ds: 0018 es: 0018 ss: 0018
Jul 19 09:59:14 host kernel: Process kjournald (pid: 18, stackpage=f782f000)
Jul 19 09:59:14 host kernel: Stack: f884eeee 00000217 f7341cc0 00000000 00000fdc c370a024 00000000 dda860c0
Jul 19 09:59:14 host kernel: efaff430 00000f6a cd787da0 00000001 00000028 f7a24164 c2e89de0 c2e896c0
Jul 19 09:59:14 host kernel: c4f7c980 da108120 d9505300 ed9d2920 c9afcb00 d37ad5a0 d37ada20 e367ec60
Jul 19 09:59:14 host kernel: Call Trace: [<884eeee>;] .rodata.str1.1 [jbd] 0x26e
Jul 19 09:59:14 host kernel: [<c0124eb5>;] update_process_times [kernel] 0x25
Jul 19 09:59:14 host kernel: [<c0116049>;] smp_apic_timer_interrupt [kernel] 0xa9
Jul 19 09:59:14 host kernel: [<c010a77f>;] do_IRQ [kernel] 0xdf
Jul 19 09:59:14 host kernel: [<c0119048>;] schedule [kernel] 0x348
Jul 19 09:59:14 host kernel: [<884a7d6>;] kjournald [jbd] 0x136
Jul 19 09:59:14 host kernel: [<884a680>;] commit_timeout [jbd] 0x0
Jul 19 09:59:14 host kernel: [<c0107286>;] kernel_thread [kernel] 0x26
Jul 19 09:59:14 host kernel: [<884a6a0>;] kjournald [jbd] 0x0
Jul 19 09:59:14 host kernel:
Jul 19 09:59:14 host kernel:
Jul 19 09:59:15 host kernel: Code: 0f 0b 5a 59 6a 04 8b 44 24 18 50 56 e8 4b f1 ff ff 8d 47 48
机房工作人员重起服务器。
启动日志中出现以下错误:
Jul 19 20:24:38 host kernel: kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
有关对
kernel: kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
的分析资料:
Whenever the linux kernel detects that an operation it attempts requires
a driver not compiled in, and not loaded yet, it calls out to a userspace
program in order to load the driver. The program used for this purpose is
configurable on the fly from userspace, and defaults to /sbin/modprobe.
The currently used program can be obtained by doing a
$ cat /proc/sys/kernel/modprobe
(Note: all this requires a kernel configured to use modules, and the
kernel module loader, of course. All distributions known to me ship
kernels configured this way)
The reason for using a userspace program is that this program might have
much more information available about the module requested than the
kernel itself. modprobe, for example, uses a dependency database to obtain
information about requirements of the kernel module itself. For example,
if the kernel tries to load the usb driver (usb-uhci.o, or another),
modprobe knows that it will need usbcore.o as well, and loads both.
In addition, modprobe can honor a configuration file (/etc/modules.conf
by default), in which additional information, as module parameters, can
be stored. When using multiple network cards inside one machine, this can
be used to map ethernet devices (eth0, eth1..) onto specific network
cards.
When the kernel boots, it tries to get it's hardware subsystems up and
running. One of these subsystems is the SCSI system. So the kernel tries
to call modprobe to load "scsi_hostadapter". This is not a real module
name, it's an alias, which has to be mapped by modprobe to a real kernel
module name. The problem is, at this early stage, the kernel is very much
on it's own. There is no root file system, because the hard disks can not
be accessed yet (there is no SCSI driver), and the initial ramdisk (a tiny,
comressed filesystem containing some small programs and drivers to make
the kernel mount it's real root filesystem) is not mounted yet, either.
So, there simply is no /sbin/modprobe for the kernel to execute, and the
attempt fails with error number 2, which means: ENOENT (No such file or
directory).
You may ask: if this call fails, then, how does the kernel detect it's
SCSI disks, anyway (it has to, since it mounts it's real root file system
of them a little while later)? The initial ramdisk contains this driver,
and explicitly loads it using /sbin/insmod (which is contained in the
ramdisk as well).
There is not much magic in the linux boot process, really, but it is still
fascinating to see how it all comes together in the end.
1,服务器内核BUG出错主要于什么设置。可能是什么原因,会有什么影响?
2,kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2 对服务器系统和内核,是否会有进一步的影响?(参考一篇英文相关资料)目前服务器从表面看运行正常,SCSI设置也正常。不知道是否会因为此failed,日后对服务器是否会有致命的影响?
关注中,请兄弟多打几个字了,谢谢了…………… |
|