浪潮300N服务器频繁宕机
浪潮AS300N服务器频繁宕机,系统日志也看不出来什么原因导致的,用tsar 看宕机前后的mem和load 都很高,几乎吃完所有的内存,跪求大神提供解决思路。/var/log/messages 日志信息
May 22 15:45:07snmpd: Connection from UDP: :12314
May 22 15:45:16snmpd: Connection from UDP: :53202
May 22 15:45:16snmpd: Received SNMP packet(s) from UDP: :53202
May 22 15:45:53snmpd: Connection from UDP: :35468
May 22 15:45:53snmpd: Received SNMP packet(s) from UDP: :35468
May 22 16:24:50syslogd 1.4.1: restart.
May 22 16:24:50kernel: klogd 1.4.1, log source = /proc/kmsg started.
May 22 16:24:50kernel: Linux version 2.6.18-348.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4
.1.2-54)) #1 SMP Tue Jan 8 17:53:53 EST 2013
May 22 16:24:50kernel: Command line: ro root=LABEL=/
May 22 16:24:50kernel: BIOS-provided physical RAM map:
May 22 16:24:50kernel:BIOS-e820: 0000000000010000 - 000000000009d000 (usable)
May 22 16:24:50kernel:BIOS-e820: 000000000009d000 - 00000000000a0000 (reserved)
May 22 16:24:50kernel:BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
May 22 16:24:50kernel:BIOS-e820: 0000000000100000 - 00000000bf790000 (usable)
May 22 16:24:50kernel:BIOS-e820: 00000000bf790000 - 00000000bf79e000 (ACPI data)
May 22 16:24:50kernel:BIOS-e820: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
May 22 16:24:50kernel:BIOS-e820: 00000000bf7d0000 - 00000000bf7e0000 (reserved)
May 22 16:24:50kernel:BIOS-e820: 00000000bf7ec000 - 00000000c0000000 (reserved)
May 22 16:24:50kernel:BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
May 22 16:24:50kernel:BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
May 22 16:24:50kernel:BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
May 22 16:24:50kernel:BIOS-e820: 0000000100000000 - 0000000240000000 (usable)
May 22 16:24:50kernel: DMI present.
May 22 16:24:50kernel: SRAT: PXM 0 -> APIC 0 -> Node 0
May 22 16:24:50kernel: SRAT: PXM 0 -> APIC 2 -> Node 0
May 22 16:24:50kernel: SRAT: PXM 0 -> APIC 4 -> Node 0
May 22 16:24:50kernel: SRAT: PXM 0 -> APIC 16 -> Node 0
May 22 16:24:50kernel: SRAT: PXM 0 -> APIC 18 -> Node 0
22/05/14-15:15 985.7M814.4M 17.2M 6.0G 7.8G 10.21
22/05/14-15:20 1.2G667.0M 18.2M 5.9G 7.8G 8.37
Time -----------------------mem----------------------
Time free used buff cach total util
22/05/14-15:25 1.0G503.7M 18.9M 6.3G 7.8G 6.32
22/05/14-15:30 1.1G643.0M 19.5M 6.1G 7.8G 8.06
22/05/14-15:35 1.0G555.2M 20.8M 6.2G 7.8G 6.96
22/05/14-15:40 518.7M658.8M 21.2M 6.6G 7.8G 8.26
22/05/14-15:45 382.9M746.2M 20.5M 6.7G 7.8G 9.36
22/05/14-16:30 5.9G239.2M 8.8M 1.6G 7.8G 3.00
22/05/14-16:35 5.9G236.7M 11.1M 1.6G 7.8G 2.97
22/05/14-16:40 5.9G245.2M 11.7M 1.6G 7.8G 3.08
22/05/14-16:45 5.9G239.1M 13.4M 1.6G 7.8G 3.00
22/05/14-16:50 5.9G239.0M 14.0M 1.6G 7.8G 3.00
22/05/14-16:55 5.9G238.1M 14.7M 1.6G 7.8G 2.99
22/05/14-17:00 5.9G240.5M 15.3M 1.6G 7.8G 3.02
22/05/14-15:1020.99 10.84 8.40 12.00535.00
22/05/14-15:15 3.41 10.70 9.57 6.00474.00
22/05/14-15:20 7.81 8.49 8.76 3.00485.00
22/05/14-15:25 0.83 3.91 6.73 1.00460.00
22/05/14-15:30 4.14 4.69 6.25 4.00476.00
Time -------------------load-----------------
Time load1 load5load15 runq plit
22/05/14-15:35 2.20 3.70 5.47 3.00491.00
22/05/14-15:40 2.50 2.97 4.69 5.00482.00
22/05/14-15:45 4.33 4.27 4.78 6.00538.00
22/05/14-16:30 0.04 0.05 0.01 1.00461.00
22/05/14-16:35 0.00 0.01 0.00 0.00456.00
22/05/14-16:40 0.00 0.00 0.00 2.00469.00
22/05/14-16:45 0.06 0.05 0.01 0.00455.00
这日志, 比没有强不了多少.
如果真的很频繁, 自己实时看一下进程情况. 可以尝试先查下有没有内存溢出之类的bug~ 定时脚本输出到log server 就是,宕机的这段时间syslogd 应该也不工作了,实时的进程?有个主进程比较吃内存,但是messages里也没有内存溢出的错误信息 :em02: 回复 2# q1208c
是代码吗?主程序是c 写的 ,这个不应该啊,如果是程序的问题,不会单单只有这台机器不停的宕机,整个服务n多台机器应该都会有问题的, messages 里也没有内存溢出的日志 回复 3# lbseraph
定时脚本输出到log server ,什么意思? 能详细点吗?多谢!回复 4# tangye
回复 7# hepeace
定时执行一个脚本,譬如察看内存/进程之类的,然后写到一个nfs卷上,看看是否内存耗尽。
如果是硬件有问题,那要从服务器的监控模块上看log了 好,我试试看,多谢了!回复 8# tangye
页:
[1]