标题: HELP! 日志里有很多堆栈信息 [打印本页] 作者: njzh24 时间: 2013-07-19 08:51 标题: HELP! 日志里有很多堆栈信息 日志中有很多堆栈信息,帮忙看看怎么回事?如下.
04:00000:00585:2013/07/19 02:36:57.22 kernel Current process (0x58cd0083) infected with signal 11 (SIGSEGV)
04:00000:00585:2013/07/19 02:36:57.22 kernel Address 0x00000001000105bc (mem_pageallocate+0x7c), siginfo (code, address) = (50, 0x00cd593800cd5939)
04:00000:00585:2013/07/19 02:36:57.22 kernel ************************************
04:00000:00585:2013/07/19 02:36:57.22 kernel curdb = 0 tempdb = 0 pstat = 0x10000
04:00000:00585:2013/07/19 02:36:57.22 kernel lasterror = 0 preverror = 0 transtate = 1
04:00000:00585:2013/07/19 02:36:57.22 kernel curcmd = 0 program =
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100281668 pcstkwalk+0x84()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100281ee4 ucstkgentrace+0x238()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100280680 ucbacktrace+0xe4()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x00000001003bd4c4 terminate_process__fdpr_3+0x938()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100b67bd4 kisignal+0x1bc()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x00000001000105bc mem_pageallocate+0x7c()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100020270 memcreate+0x6c()
04:00000:00585:2013/07/19 02:36:57.22 kernel [Handler pc: 0x0000000100270450 hdl_backout installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel [Handler pc: 0x00000001004b9150 ut_handle installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel [Handler pc: 0x00000001004b9150 ut_handle installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x000000010015ea90 conn_hdlr__fdpr_2+0x21c()
04:00000:00585:2013/07/19 02:36:57.22 kernel end of stack trace, spid 585, kpid 1489830019, suid 0
04:00000:00585:2013/07/19 02:36:57.22 server The SQL Server is terminating this process.
04:00000:00585:2013/07/19 02:36:57.22 kernel Current process (0x58cd0083) infected with signal 4 (SIGILL)
04:00000:00585:2013/07/19 02:36:57.22 kernel Address 0x0000000000000000 (), siginfo (code, address) = (30, 0x0000000000000000)
04:00000:00585:2013/07/19 02:36:57.22 kernel ************************************
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100281668 pcstkwalk+0x84()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100281ee4 ucstkgentrace+0x238()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100280680 ucbacktrace+0xe4()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x00000001003bcca4 terminate_process__fdpr_3+0x118()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100b67bd4 kisignal+0x1bc()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000000000000 ()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x00000001003bd54c terminate_process__fdpr_3+0x9c0()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100b67bd4 kisignal+0x1bc()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x00000001000105bc mem_pageallocate+0x7c()
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x0000000100020270 memcreate+0x6c()
04:00000:00585:2013/07/19 02:36:57.22 kernel [Handler pc: 0x0000000100270450 hdl_backout installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel [Handler pc: 0x00000001004b9150 ut_handle installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel [Handler pc: 0x00000001004b9150 ut_handle installed by the following function:-]
04:00000:00585:2013/07/19 02:36:57.22 kernel pc: 0x000000010015ea90 conn_hdlr__fdpr_2+0x21c()
04:00000:00585:2013/07/19 02:36:57.22 kernel end of stack trace, spid 585, kpid 1489830019, suid 0
04:00000:01153:2013/07/19 02:36:59.99 kernel Current process (0x58ce009a) infected with signal 11 (SIGSEGV)
04:00000:01153:2013/07/19 02:36:59.99 kernel Address 0x00000001000105bc (mem_pageallocate+0x7c), siginfo (code, address) = (50, 0x00cd593800cd5939)
04:00000:01153:2013/07/19 02:36:59.99 kernel ************************************
04:00000:01153:2013/07/19 02:36:59.99 kernel curdb = 0 tempdb = 0 pstat = 0x10000
04:00000:01153:2013/07/19 02:36:59.99 kernel lasterror = 0 preverror = 0 transtate = 1
04:00000:01153:2013/07/19 02:36:59.99 kernel curcmd = 0 program =
04:00000:01153:2013/07/19 02:36:59.99 kernel pc: 0x0000000100281668 pcstkwalk+0x84()
04:00000:01153:2013/07/19 02:36:59.99 kernel pc: 0x0000000100281ee4 ucstkgentrace+0x238()作者: andkylee 时间: 2013-07-19 12:39
什么平台,版本?
内存配置多大?
如果买了原厂服务,找工程师解决。作者: njzh24 时间: 2013-07-22 09:13
AIX53, ASE 12.5, max memory 28G, data cache 15G, procedure cache 3.5G, stack size 251904 bytes,
是内存配置问题??
服务已过期...作者: andkylee 时间: 2013-07-22 12:51
ASE12.5原厂也不支持了。
发一下ipcs -a的结果看看。作者: njzh24 时间: 2013-07-23 09:12
$ ipcs -a
IPC status from /dev/mem as of Tue Jul 23 09:11:44 BEIST 2013
T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME
Message Queues:
T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
Shared Memory:
m 10485760 0x690022b9 --rw------- sybase sybase sybase sybase 8 35701915648 213982 237802 1:42:53 1:42:53 1:42:19
m 1048577 0x78000011 --rw-rw-rw- root system root system 1 268435456 230050 295072 9:57:15 9:01:59 9:57:15
m 1048578 0x78000010 --rw-rw-rw- root system root system 1 16777216 230050 295072 9:57:15 9:01:59 9:57:15
m 5242885 0x690022a1 --rw------- sybase sybase sybase sybase 6 20387991552 258366 217548 3:42:15 3:42:15 3:41:38
T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME
Semaphores:
s 1 0x62031656 --ra-r--r-- root system root system 1 9:56:36 9:56:36
s 6 0x0103149c --ra------- root system root system 1 9:57:32 9:57:32
s 20971531 0x0101c6e8 --ra-ra-ra- root system root system 1 15:55:37 15:55:26作者: njzh24 时间: 2013-07-23 09:13
在sybase官网找到的解释:
Current process infected
Message text
current process (0x%x) infected with %d
This error may be caused by a hardware problem.
Explanation
Adaptive Server reports this error when it detects a UNIX signal that specifies an error. The values (“%d”) that display in this error message vary by platform and Adaptive Server Enterprise versions; the most common values are 10 and 11.
Current process infected with 10
A value of 10 [SIGBUS] means that the operating system detected an address alignment error or a miscellaneous hardware error (for example, bus timeout).
A timeout can occur when the CPU issues a request across the bus for the contents of a memory location, and that request is not answered within that CPU’s timeout period (usually a few nanoseconds).
Current process infected with 11
A value of 11 [SIGSEGV] means that the operating system detected a segment violation error.
Sometimes this error occurs in conjunction with a stack overflow or data corruption. For more information on stack overflow, refer to the error write-up “Stack guardword corrupted”.
The error message appears in the Adaptive Server error log followed by a stack trace. The “SQL causing error” or the <lasterror> that displays in the Adaptive Server error log may be the underlying cause for this error. But the message can also be just the last data Adaptive Server had in its cache space.
To identify the <lasterror> (except in the cases where the <lasterror> is 0), get the number that Adaptive Server displays by in the <lasterror> field from the Adaptive Server error log and consult this manual for more information on the error number.
In the following example, the value for <lasterror> is 614.
00: 94/02/14 11:32:26.02 kernel: current process (0x1fb001d)
infected with 11
00: 94/02/14 11:32:26.07 kernel: Address 0x808a6ef
(closetable+0x2f7), siginfo (code, address) = (2, 0x30)
00: 94/02/14 11:32:26.07 kernel: ************************************
00: 94/02/14 11:32:26.07 kernel: “SQL causing error” : CREATE TRIGGER
00: 94/02/14 11:32:26.07 kernel: curdb = 22 pstat = 0x10018
“lasterror = 614”
Action
1.Try to eliminate the <lasterror>, which may be one of the causes for this error (except when <lasterror> is 0).
2.Rerun the command referenced by the SQL causing error to see if the problem reoccurs.
If the process is infected with 11 and you can reproduce the problem, correct it as follows:
•If the SQL causing error is a compiled object such as a stored procedure, trigger, or view, drop and recreate the object.
•If the SQL causing error is ad hoc rather than a compiled object, moving the data may fix the problem. Use one of these options:
◦Select the table data into a new table, drop the old table, and rename the new table to the old table name;
or
◦Bulk copy the affected table out, drop and re-create the table, and bulk copy back in. This is the most efficient solution for a large table.
If moving the data corrects the problem, the data may have been corrupt. Be aware that moving corrupted data can lead to a data loss.
Check your hardware error log as this error can be caused by hardware failure as well.作者: njzh24 时间: 2013-07-26 10:04 回复 2# andkylee
这几天堆栈信息只报了两次,我把报错的过程自己执行一遍,无异常.
期间只做过统计值更新(定时任务),checktable也无异常.....