Chinaunix

标题: HELP! 日志里有很多堆栈信息 [打印本页]

作者: njzh24 时间: 2013-07-19 08:51
标题: HELP! 日志里有很多堆栈信息
日志中有很多堆栈信息,帮忙看看怎么回事?如下.
04:00000:00585:2013/07/19 02:36:57.22 kernel  Current process (0x58cd0083) infected with signal 11 (SIGSEGV)
04:00000:00585:2013/07/19 02:36:57.22 kernel  Address 0x00000001000105bc (mem_pageallocate+0x7c), siginfo (code, address) = (50, 0x00cd593800cd5939)
04:00000:00585:2013/07/19 02:36:57.22 kernel  ************************************
04:00000:00585:2013/07/19 02:36:57.22 kernel  curdb = 0 tempdb = 0 pstat = 0x10000
04:00000:00585:2013/07/19 02:36:57.22 kernel  lasterror = 0 preverror = 0 transtate = 1
04:00000:00585:2013/07/19 02:36:57.22 kernel  curcmd = 0 program =
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100281668 pcstkwalk+0x84()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100281ee4 ucstkgentrace+0x238()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100280680 ucbacktrace+0xe4()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x00000001003bd4c4 terminate_process__fdpr_3+0x938()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100b67bd4 kisignal+0x1bc()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x00000001000105bc mem_pageallocate+0x7c()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100020270 memcreate+0x6c()
04:00000:00585:2013/07/19 02:36:57.22 kernel  [Handler pc: 0x0000000100270450 hdl_backout installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel  [Handler pc: 0x00000001004b9150 ut_handle installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel  [Handler pc: 0x00000001004b9150 ut_handle installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x000000010015ea90 conn_hdlr__fdpr_2+0x21c()
04:00000:00585:2013/07/19 02:36:57.22 kernel  end of stack trace, spid 585, kpid 1489830019, suid 0
04:00000:00585:2013/07/19 02:36:57.22 server  The SQL Server is terminating this process.
04:00000:00585:2013/07/19 02:36:57.22 kernel  Current process (0x58cd0083) infected with signal 4 (SIGILL)
04:00000:00585:2013/07/19 02:36:57.22 kernel  Address 0x0000000000000000 (), siginfo (code, address) = (30, 0x0000000000000000)
04:00000:00585:2013/07/19 02:36:57.22 kernel  ************************************
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100281668 pcstkwalk+0x84()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100281ee4 ucstkgentrace+0x238()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100280680 ucbacktrace+0xe4()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x00000001003bcca4 terminate_process__fdpr_3+0x118()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100b67bd4 kisignal+0x1bc()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000000000000 ()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x00000001003bd54c terminate_process__fdpr_3+0x9c0()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100b67bd4 kisignal+0x1bc()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x00000001000105bc mem_pageallocate+0x7c()
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x0000000100020270 memcreate+0x6c()
04:00000:00585:2013/07/19 02:36:57.22 kernel  [Handler pc: 0x0000000100270450 hdl_backout installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel  [Handler pc: 0x00000001004b9150 ut_handle installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel  [Handler pc: 0x00000001004b9150 ut_handle installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel  pc: 0x000000010015ea90 conn_hdlr__fdpr_2+0x21c()
04:00000:00585:2013/07/19 02:36:57.22 kernel  end of stack trace, spid 585, kpid 1489830019, suid 0
04:00000:01153:2013/07/19 02:36:59.99 kernel  Current process (0x58ce009a) infected with signal 11 (SIGSEGV)
04:00000:01153:2013/07/19 02:36:59.99 kernel  Address 0x00000001000105bc (mem_pageallocate+0x7c), siginfo (code, address) = (50, 0x00cd593800cd5939)
04:00000:01153:2013/07/19 02:36:59.99 kernel  ************************************
04:00000:01153:2013/07/19 02:36:59.99 kernel  curdb = 0 tempdb = 0 pstat = 0x10000
04:00000:01153:2013/07/19 02:36:59.99 kernel  lasterror = 0 preverror = 0 transtate = 1
04:00000:01153:2013/07/19 02:36:59.99 kernel  curcmd = 0 program =
04:00000:01153:2013/07/19 02:36:59.99 kernel  pc: 0x0000000100281668 pcstkwalk+0x84()
04:00000:01153:2013/07/19 02:36:59.99 kernel  pc: 0x0000000100281ee4 ucstkgentrace+0x238()

作者: andkylee 时间: 2013-07-19 12:39
什么平台，版本？
内存配置多大？
如果买了原厂服务，找工程师解决。

作者: njzh24 时间: 2013-07-22 09:13
AIX53, ASE 12.5, max memory 28G, data cache 15G, procedure cache 3.5G, stack size 251904 bytes,
是内存配置问题??
服务已过期...

作者: andkylee 时间: 2013-07-22 12:51
ASE12.5原厂也不支持了。
发一下ipcs -a的结果看看。

作者: njzh24 时间: 2013-07-23 09:12
$ ipcs -a
IPC status from /dev/mem as of Tue Jul 23 09:11:44 BEIST 2013
T       ID    KEY       MODE    OWNER GROUP  CREATOR CGROUP CBYTES  QNUM QBYTES LSPID LRPID STIME RTIME CTIME
Message Queues:
T       ID    KEY       MODE    OWNER GROUP  CREATOR CGROUP NATTCH    SEGSZ  CPID  LPID ATIME DTIME CTIME
Shared Memory:
m  10485760 0x690022b9 --rw------- sybase sybase sybase sybase    8 35701915648 213982 237802  1:42:53  1:42:53  1:42:19
m 1048577 0x78000011 --rw-rw-rw-    root system    root system    1 268435456 230050 295072  9:57:15  9:01:59  9:57:15
m 1048578 0x78000010 --rw-rw-rw-    root system    root system    1  16777216 230050 295072  9:57:15  9:01:59  9:57:15
m 5242885 0x690022a1 --rw------- sybase sybase sybase sybase    6 20387991552 258366 217548  3:42:15  3:42:15  3:41:38
T       ID    KEY       MODE    OWNER GROUP  CREATOR CGROUP NSEMS OTIME CTIME
Semaphores:
s       1 0x62031656 --ra-r--r--    root system    root system    1  9:56:36  9:56:36
s       6 0x0103149c --ra-------    root system    root system    1  9:57:32  9:57:32
s  20971531 0x0101c6e8 --ra-ra-ra-    root system    root system    1 15:55:37 15:55:26

作者: njzh24 时间: 2013-07-23 09:13
在sybase官网找到的解释:

Current process infected
Message text
current process (0x%x) infected with %d
This error may be caused by a hardware problem.

Explanation
Adaptive Server reports this error when it detects a UNIX signal that specifies an error. The values (“%d”) that display in this error message vary by platform and Adaptive Server Enterprise versions; the most common values are 10 and 11.

Current process infected with 10
A value of 10 [SIGBUS] means that the operating system detected an address alignment error or a miscellaneous hardware error (for example, bus timeout).

A timeout can occur when the CPU issues a request across the bus for the contents of a memory location, and that request is not answered within that CPU’s timeout period (usually a few nanoseconds).

Current process infected with 11
A value of 11 [SIGSEGV] means that the operating system detected a segment violation error.

Sometimes this error occurs in conjunction with a stack overflow or data corruption. For more information on stack overflow, refer to the error write-up “Stack guardword corrupted”.

The error message appears in the Adaptive Server error log followed by a stack trace. The “SQL causing error” or the <lasterror> that displays in the Adaptive Server error log may be the underlying cause for this error. But the message can also be just the last data Adaptive Server had in its cache space.

To identify the <lasterror> (except in the cases where the <lasterror> is 0), get the number that Adaptive Server displays by in the <lasterror> field from the Adaptive Server error log and consult this manual for more information on the error number.

In the following example, the value for <lasterror> is 614.

00: 94/02/14 11:32:26.02 kernel: current process (0x1fb001d)
infected with 11
00: 94/02/14 11:32:26.07 kernel: Address 0x808a6ef
(closetable+0x2f7), siginfo (code, address) = (2, 0x30)
00: 94/02/14 11:32:26.07 kernel: ************************************
00: 94/02/14 11:32:26.07 kernel: “SQL causing error” : CREATE TRIGGER
00: 94/02/14 11:32:26.07 kernel: curdb = 22 pstat = 0x10018
“lasterror = 614”
Action
1.Try to eliminate the <lasterror>, which may be one of the causes for this error (except when <lasterror> is 0).

2.Rerun the command referenced by the SQL causing error to see if the problem reoccurs.

If the process is infected with 11 and you can reproduce the problem, correct it as follows:

•If the SQL causing error is a compiled object such as a stored procedure, trigger, or view, drop and recreate the object.

•If the SQL causing error is ad hoc rather than a compiled object, moving the data may fix the problem. Use one of these options:

◦Select the table data into a new table, drop the old table, and rename the new table to the old table name;

or

◦Bulk copy the affected table out, drop and re-create the table, and bulk copy back in. This is the most efficient solution for a large table.

If moving the data corrects the problem, the data may have been corrupt. Be aware that moving corrupted data can lead to a data loss.

Check your hardware error log as this error can be caused by hardware failure as well.

作者: njzh24 时间: 2013-07-26 10:04
回复 2# andkylee
这几天堆栈信息只报了两次,我把报错的过程自己执行一遍,无异常.
期间只做过统计值更新(定时任务),checktable也无异常.....

作者: zhaopingzi 时间: 2013-07-26 10:41
试下增大stack size

作者: njzh24 时间: 2013-07-27 17:50
回复 8# zhaopingzi

Configuration option is not unique.

Parameter Name                Default    Memory Used Config Value Run Value Unit                Type
------------------------------ ----------- ----------- ------------ ----------- -------------------- ----------
esp execution stacksize       65536          0       65536       65536          bytes             static
stack guard size                4096    #20352    16384       16384          bytes             static
stack size                         88472    #312912    251904    251904       bytes             static
已经很大了.
另外请教一下, stack size与stack guard size的联系, 官网的英文解释没看懂....

作者: mission_g 时间: 2013-12-12 14:44
stack size与stack guard size的联系
-----
看过电影里面的压力表吧,通俗的解释stack guard size 就是表盘上红色的那段,明白了?

你的这个问题: 我的建议是调大procedure cache size.
另外12.5已经end of life 了, 可以考虑升级到新版本.

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)