免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 8874 | 回复: 4
打印 上一主题 下一主题

[硬件及驱动] soft lockup问题 [复制链接]

论坛徽章:
1
拜羊年徽章
日期:2015-03-03 16:15:43
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2012-05-03 09:25 |只看该作者 |倒序浏览
本帖最后由 linuxfellow 于 2012-05-03 09:49 编辑

一下子接到三个reset问题: 两个soft lockup, 一个oom, 都和串口通讯有关。
怀疑进程负载不平衡,请专家看一下:

Reset#1.

BUG: soft lockup - CPU#0 stuck for 61s! [task1:1506]
Modules linked in: comm_net comm_net_driver comm_device fpga_cs

Pid: 1506, comm:         task1
CPU: 0    Not tainted  (2.6.31-207-g7286c01-ga82952b-dirty #12)
PC is at sub_preempt_count+0x10/0xc4
LR is at mix_pool_bytes_extract+0x150/0x180
pc : [<c0044acc>]    lr : [<c01c3b44>]    psr: 60000113
sp : c101dde8  ip : b99cf7e3  fp : c101ddec
r10: c101de38  r9 : 00000067  r8 : 80000113
r7 : 00000000  r6 : 0000007f  r5 : 00000000  r4 : c04430f8
r3 : 00000000  r2 : 00000007  r1 : 17f58c40  r0 : 00000001
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 00c5387d  Table: 817bc008  DAC: 00000015

Reset#2:
CPU Usage:
Mem: 121724K used, 3616K free, 0K shrd, 36K buff, 99564K cached
CPU:   6% usr   2% sys   0% nic  90% idle   0% io   0% irq   0% sirq
Load average: 0.00 0.00 0.00 3/65 3604
  PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
2766  1705 root     S     7676   6%   6% ./V_TASK
1143     1 root     R     9052   7%   2% /usr/local/bin/tcpbridge -C -u -i eth0

2768  2767 root     S     7676   6%   1% ./V_TASK
2883  2767 root     S     7676   6%   1% ./V_TASK
2788  2767 root     S     7676   6%   0% ./V_TASK
3602  1705 root     R     1040   1%   0% top
2771  2767 root     S N   7676   6%   0% ./V_TASK
2772  2767 root     S     7676   6%   0% ./V_TASK
2770  2767 root     S N   7676   6%   0% ./V_TASK
2803  2767 root     S     7676   6%   0% ./V_TASK
2769  2767 root     S N   7676   6%   0% ./V_TASK
3431  2767 root     S     7676   6%   0% ./V_TASK
2778  2767 root     S     7676   6%   0% ./V_TASK
2775  2767 root     S     7676   6%   0% ./V_TASK
2896  2767 root     S     7676   6%   0% ./V_TASK
2792  2767 root     S     7676   6%   0% ./V_TASK
2789  2767 root     S     7676   6%   0% ./V_TASK
2892  2767 root     S     7676   6%   0% ./V_TASK
2828  2767 root     S     7676   6%   0% ./V_TASK


BUG: soft lockup - CPU#0 stuck for 61s! [V_TASK:2766]
Modules linked in: comm_net comm_net_driver comm_device fpga_cs

Pid: 2766, comm:            V_TASK
CPU: 0    Not tainted  (2.6.31-207-g7286c01-g38e3ffd-dirty #
PC is at handle_IRQ_event+0x1c/0x1e4
LR is at handle_edge_irq+0x130/0x198
pc : [<c0079990>]    lr : [<c007b960>]    psr: 60000113
sp : c4d3ff30  ip : 00000001  fp : c043832c
r10: 00000086  r9 : 00000050  r8 : c7f00400
r7 : 00000086  r6 : c4d3e000  r5 : 00000086  r4 : c7f00400
r3 : 00000000  r2 : 00010002  r1 : c7f00400  r0 : 00000086
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 00c5387d  Table: 87640008  DAC: 00000015


Reset#3:

INFO: task V_TASK:1166 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
V_TASK     D c031c2ac     0  1166      1 0x00000000
[<c031c2ac>] (schedule+0x37c/0x40c) from [<c031c360>] (io_schedule+0x24/0x40)
[<c031c360>] (io_schedule+0x24/0x40) from [<c00907ac>] (sync_page+0x4c/0x5
[<c00907ac>] (sync_page+0x4c/0x5 from [<c031cb0c>] (__wait_on_bit_lock+0x54/0x9c)
[<c031cb0c>] (__wait_on_bit_lock+0x54/0x9c) from [<c0090740>] (__lock_page+0x64/0x74)
[<c0090740>] (__lock_page+0x64/0x74) from [<c00908d8>] (find_lock_page+0x44/0x6c)
[<c00908d8>] (find_lock_page+0x44/0x6c) from [<c0090b90>] (grab_cache_page_write_begin+0x28/0x90)
[<c0090b90>] (grab_cache_page_write_begin+0x28/0x90) from [<c014fb74>] (ubifs_wr
ite_begin+0xe0/0x544)
[<c014fb74>] (ubifs_write_begin+0xe0/0x544) from [<c00917d0>] (generic_file_buff
ered_write+0xf8/0x2d
[<c00917d0>] (generic_file_buffered_write+0xf8/0x2d from [<c0092060>] (__gener
ic_file_aio_write_nolock+0x468/0x4b4)
[<c0092060>] (__generic_file_aio_write_nolock+0x468/0x4b4) from [<c0092a8c>] (ge
neric_file_aio_write+0x6c/0xe4)
[<c0092a8c>] (generic_file_aio_write+0x6c/0xe4) from [<c014f200>] (ubifs_aio_write+0x160/0x1c0)
[<c014f200>] (ubifs_aio_write+0x160/0x1c0) from [<c00bc94c>] (do_sync_write+0xb8/0xf
[<c00bc94c>] (do_sync_write+0xb8/0xf from [<c00bd030>] (vfs_write+0xb0/0x15
[<c00bd030>] (vfs_write+0xb0/0x15 from [<c00bd2d0>] (sys_write+0x3c/0x6
[<c00bd2d0>] (sys_write+0x3c/0x68) from [<c0029d60>] (ret_fast_syscall+0x0/0x2c)

boa invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
[<c002e270>] (unwind_backtrace+0x0/0xd0) from [<c0093638>] (oom_kill_process.clo
ne.10+0x78/0x200)
[<c0093638>] (oom_kill_process.clone.10+0x78/0x200) from [<c0093bb4>] (__out_of_
memory+0x174/0x198)
[<c0093bb4>] (__out_of_memory+0x174/0x198) from [<c0093e68>] (out_of_memory+0x68/0xc0)
[<c0093e68>] (out_of_memory+0x68/0xc0) from [<c0096738>] (__alloc_pages_nodemask
+0x438/0x538)
[<c0096738>] (__alloc_pages_nodemask+0x438/0x538) from [<c009688c>] (__get_free_
pages+0x10/0x3c)
[<c009688c>] (__get_free_pages+0x10/0x3c) from [<c0030650>] (get_pgd_slow+0x14/0xd8)
[<c0030650>] (get_pgd_slow+0x14/0xd8) from [<c004b974>] (mm_init.clone.41+0xa8/0xe8)
[<c004b974>] (mm_init.clone.41+0xa8/0xe8) from [<c004bbe8>] (dup_mm+0x64/0x3e4)
[<c004bbe8>] (dup_mm+0x64/0x3e4) from [<c004c740>] (copy_process+0x780/0xe54)
[<c004c740>] (copy_process+0x780/0xe54) from [<c004cf94>] (do_fork+0x15c/0x348)
[<c004cf94>] (do_fork+0x15c/0x348) from [<c002cc88>] (sys_fork+0x20/0x24)
[<c002cc88>] (sys_fork+0x20/0x24) from [<c0029d60>] (ret_fast_syscall+0x0/0x2c)
Mem-info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:   42, btch:   7 usd:   0
Active_anon:587 active_file:289 inactive_anon:642
inactive_file:325 unevictable:23480 dirty:376 writeback:1 unstable:0
free:1825 slab:1312 mapped:2390 pagetables:74 bounce:0
DMA free:680kB min:268kB low:332kB high:400kB active_anon:72kB inactive_anon:108
kB active_file:0kB inactive_file:0kB unevictable:17892kB present:24384kB pages_s
canned:0 all_unreclaimable? no
lowmem_reserve[]: 0 103 103
Normal free:6620kB min:1168kB low:1460kB high:1752kB active_anon:2276kB inactive
_anon:2460kB active_file:1156kB inactive_file:1300kB unevictable:76028kB present
:105664kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 2*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB
0*4096kB 0*8192kB 0*16384kB = 680kB
Normal: 1301*4kB 177*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0
*2048kB 0*4096kB 0*8192kB 0*16384kB = 6620kB
24198 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 0kB
Total swap = 0kB
32768 pages of RAM
1938 free pages
1452 reserved pages
1120 slab pages
64618 pages shared
0 pages swap cached
Out of memory: kill process 1177 (task1) score 240 or a child
Killed process 1177 (task1)
boa invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
[<c002e270>] (unwind_backtrace+0x0/0xd0) from [<c0093638>] (oom_kill_process.clo
ne.10+0x78/0x200)
[<c0093638>] (oom_kill_process.clone.10+0x78/0x200) from [<c0093bb4>] (__out_of_memory+0x174/0x198)
[<c0093bb4>] (__out_of_memory+0x174/0x198) from [<c0093e68>] (out_of_memory+0x68/0xc0)
[<c0093e68>] (out_of_memory+0x68/0xc0) from [<c0096738>] (__alloc_pages_nodemask
+0x438/0x538)
[<c0096738>] (__alloc_pages_nodemask+0x438/0x538) from [<c009688c>] (__get_free_
pages+0x10/0x3c)
[<c009688c>] (__get_free_pages+0x10/0x3c) from [<c0030650>] (get_pgd_slow+0x14/0xd8)
[<c0030650>] (get_pgd_slow+0x14/0xd8) from [<c004b974>] (mm_init.clone.41+0xa8/0xe8)
[<c004b974>] (mm_init.clone.41+0xa8/0xe8) from [<c00c1f48>] (bprm_mm_init+0xc/0x144)
[<c00c1f48>] (bprm_mm_init+0xc/0x144) from [<c00c23ec>] (do_execve+0x13c/0x308)
[<c00c23ec>] (do_execve+0x13c/0x308) from [<c002cd08>] (sys_execve+0x34/0x54)
[<c002cd08>] (sys_execve+0x34/0x54) from [<c0029d60>] (ret_fast_syscall+0x0/0x2c

论坛徽章:
2
CU十二周年纪念徽章
日期:2013-10-24 15:41:34处女座
日期:2013-12-27 22:22:41
2 [报告]
发表于 2012-05-03 10:58 |只看该作者
单核多核?

第一个问题,看起来是mix_pool_bytes_extract里spin_unlock_irqrestore恢复中断的一瞬间detect到了lockup。里面确实有个循环,但看不出来会stuck 60秒。。。。

论坛徽章:
1
拜羊年徽章
日期:2015-03-03 16:15:43
3 [报告]
发表于 2012-05-03 11:38 |只看该作者
本帖最后由 linuxfellow 于 2012-05-03 12:06 编辑

回复 2# tempname2

单核, imx35
看样子串口驱动太忙,一直busy  hold cpu, 超过一分钟,其他进程没机会运行?头两个reset都发生在中断子程序,极有可能中断太频繁,或中断任务太重。
第三个问题好像是加载执行.elf时内存不够。
   

论坛徽章:
1
拜羊年徽章
日期:2015-03-03 16:15:43
4 [报告]
发表于 2012-05-03 19:32 |只看该作者
本帖最后由 linuxfellow 于 2012-05-03 19:35 编辑

如何解读mem stats信息? reset3时,
6M free mem
active anonymous 2.3M  inactive anonymous 2.5M active file baacked mem 1.2M inactive file backed mem: 1.3M
mem can't be reclaimed: 76M
mem present 106M
会不会6M不够用了?

论坛徽章:
1
拜羊年徽章
日期:2015-03-03 16:15:43
5 [报告]
发表于 2012-05-04 07:49 |只看该作者
有点谱了。这些实际上是内存不够引起的。串口通讯缓冲区分配释放没处理好,存在memory leak。 当系统内存越来越少时,进程运行就越来越慢,就会发生soft lockup问题。只要memory leak问题解决了,所有这些reset问题就可解决。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP