免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2261 | 回复: 0
打印 上一主题 下一主题

网络应用的一个内核报错 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2007-05-01 15:52 |只看该作者 |倒序浏览
我在ARM上跑2.4.27的内核,上面共有3个网络设备,一个标准的eth0,一个用模拟的串口实现的ppp0,还

有一个同步串口模拟的eth1。我通过修改内核的办法实现eth0向eth1转发(因为有一些特殊应用,故没有

用路由),但ppp0是通过路由转发网管数据。设备跑起来能实现基本的功能,但是在同时开ppp0和eth1时

经常出现两个错误:
一个是:
[root @ AT91RM9200DK]# KERNEL: assertion (atomic_read(&skb->users) == 0) failed at dev.c

(1457)
Warning: kfree_sk<1>Unable to handle kernel NULL pointer dereference at virtual

address 00000004
pgd = c0840000
[00000004] *pgd=21202001, *pmd = 21202001, *pte =

00000000, *ppte = 00000000
Internal error: Oops: 807
CPU: 0
pc : [<c00c4f9c>]    lr :

[<c00c1008>]    Not tainted
sp : c0871d74  ip : c0871d94  fp : c0871d90
r10: c016ce30  r9

: fefd0000  r8 : c0871e4c
r7 : 000005fe  r6 : 00000020  r5 : c016d048  r4 : 000308c3
r3 :

00000000  r2 : 00000000  r1 : 20000013  r0 : c16e93a0
Flags: nzCv  IRQs off  FIQs on  Mode

SVC_32  Segment user
Control: C000317F  Table: 20840000  DAC: 00000015
Process pppd (pid:

663, stack limit = 0xc0870374)
Stack: (0xc0871d74 to 0xc0872000)
1d60:                     

                         000308c3 c16ea360 00000004
1d80: 000005fc c0871dcc c0871d94

c00c1008 c00c4f04 00000008 c2809048 c280a85c
1da0: c2809000 000308c3 c016ce30 fefd0000

c16ea360 c0871e4c ffffffff 0003b50c
1dc0: c0871dec c0871dd0 c00c13c4 c00c0f20 c01cb540

00000000 0000000e c0871e4c
1de0: c0871e0c c0871df0 c0018e50 c00c1398 c014d978 c01cb540

0000000e c014d5c0
1e00: c0871e30 c0871e10 c0018f40 c0018e10 c01554a0 c0871e4c 0000000e

c00ba898
1e20: 40000013 c0871e48 c0871e34 c00190a8 c0018e8c c0871e80 fefff000 c0871ea4


1e40: c0871e4c c00181c0 c0019084 c016c6f0 c01529f0 fefff200 40001818 00000001
1e60:

00001594 000015d6 00000045 60000093 0003cc01 0003b50c c0871ea4 00000011
1e80: c0871e94

00000042 c00ba898 40000013 ffffffff c0149478 c0871ec0 c0871ea8
1ea0: c002319c c00ba870

000015d6 000015d6 c01432dc c0871ed4 c0871ec4 c0023218
1ec0: c0023150 000015d6 c0871ef0

c0871ed8 c0023314 c00231c4 ffffea2a 60000013
1ee0: 60000013 c0871f08 c0871ef4 c0023608

c0023238 c0151019 c01432e0 c0871f28
1f00: c0871f0c c0023524 c00235c8 00000000 c16e93a0

c01421c0 00000000 c0871f50
1f20: c0871f3c c00c52bc c00233e8 c0136d08 c00c9520 c01554a0

c016d048 00000000
1f40: c16e93a0 c0871f70 c0871f54 c00c9520 c00c52a4 00000001 c01420a8

fffffffd
1f60: c01554a0 c0871f94 c0871f74 c0025d78 c00c94c0 c01554a0 c0871fb0 00000018


1f80: c0003177 c0871fec c0871fac c0871f98 c00190d8 c0025d0c c0142d60 fefff000
1fa0:

00000000 c0871fb0 c0018348 c0019084 0004ad28 00000001 0004ad38 00000000
1fc0: 000463a0

0004ad28 0000d330 00000003 00000000 0003cc01 0003b50c bffffbdc
1fe0: 000042c8 bffffbbc

400cbe70 0000bce8 20000010 ffffffff 00000000 00000000
Backtrace:
Function entered at

[<c00c4ef4>] from [<c00c1008>]
r7 = 000005FC  r6 = 00000004  r5 = C16EA360  r4 = 000308C3
Function entered at [<c00c0f10>] from [<c00c13c4>]
Function entered at [<c00c1388>] from

[<c0018e50>]
r7 = C0871E4C  r6 = 0000000E  r5 = 00000000  r4 = C01CB540
Function entered

at [<c0018e00>] from [<c0018f40>]
r7 = C014D5C0  r6 = 0000000E  r5 = C01CB540  r4 =

C014D978
Function entered at [<c0018e7c>] from [<c00190a8>]
r8 = 40000013  r7 = C00BA898  

r6 = 0000000E  r5 = C0871E4C
r4 = C01554A0
Function entered at [<c0019074>] from

[<c00181c0>]
r5 = FEFFF000  r4 = C0871E80
Function entered at [<c00ba860>] from

[<c002319c>]
r4 = C0149478
Function entered at [<c0023140>] from [<c0023218>]
r6 =

C01432DC  r5 = 000015D6  r4 = 000015D6
Function entered at [<c00231b4>] from [<c0023314>]
r4 = 000015D6
Function entered at [<c0023228>] from [<c0023608>]
r6 = 60000013  r5 =

60000013  r4 = FFFFEA2A
Function entered at [<c00235b8>] from [<c0023524>]
r5 = C01432E0

r4 = C0151019
Function entered at [<c00233d4>] from [<c00c52bc>]
r3 = C016D048  r2 =

C01554A0  r1 = C00C9520  r0 = C0136D08
r7 = 00000000  r6 = C01421C0  r5 = C16E93A0  r4 =

00000000
Function entered at [<c00c5294>] from [<c00c9520>]
r5 = C16E93A0  r4 = 00000000
Function entered at [<c00c94b0>] from [<c0025d78>]
r7 = C01554A0  r6 = FFFFFFFD  r5 =

C01420A8  r4 = 00000001
Function entered at [<c0025cfc>] from [<c00190d8>]
r8 = C0871FEC  

r7 = C0003177  r6 = 00000018  r5 = C0871FB0
r4 = C01554A0
Function entered at

[<c0019074>] from [<c0018348>]
r5 = FEFFF000  r4 = C0142D60
Code: e5902000 e2433001

e5853008 e3a03000 (e5825004)
Kernel panic: Aiee, killing interrupt handler!
In interrupt

handler - not syncing


还有一个是:
[root @ AT91RM9200DK]# <4>Warning: kfree_skb passed an skb still on a list (from c00c9594).
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd =

c0004000
[00000000] *pgd=00000000, *pmd = 00000000
Internal error: Oops: 807
CPU: 0
pc :

[<c00c531c>]    lr : [<c002366c>]    Not tainted
sp : c0141ef0  ip : c0141ea8  fp :

c0141f04
r10: 2001461c  r9 : ffffffff  r8 : 60000093
r7 : 00000000  r6 : c01421c0  r5 :

c16e9080  r4 : c16e9080
r3 : 00000000  r2 : 00000001  r1 : 00000001  r0 : 00000045
Flags:

nZCv  IRQs on  FIQs on  Mode SVC_32  Segment kernel
Control: C000317F  Table: 21554000  

DAC: 0000001D
Process swapper (pid: 0, stack limit = 0xc0140374)
Stack: (0xc0141ef0 to

0xc0142000)
1ee0:                                     c16e9080 00000000 c0141f24 c0141f08
1f00: c00c9594 c00c5300 00000001 c01420a8 fffffffd c01554a0 c0141f48 c0141f28
1f20:

c0025d78 c00c951c c01554a0 c0141f64 00000018 c00196b8 60000013 c0141f60
1f40: c0141f4c

c00190d8 c0025d0c c0141f98 fefff000 c0141fb8 c0141f64 c00181c0
1f60: c0019084 00000000

00000032 00000000 60000013 c001966c c0140000 c014de80
1f80: c014de74 c0142d84 41129200

2001461c c0141fb8 c0141fbc c0141fac c00196ac
1fa0: c00196b8 60000013 ffffffff c0141fd0

c0141fbc c0019718 c001967c c0156ce0
1fc0: c016bffc c0141fe0 c0141fd4 c0016030 c00196d0

c0141ffc c0141fe4 c00086fc
1fe0: c0016010 c014e290 c016f8a8 c016f8a8 00000000 c0142000

c0008080 c00085b8
Backtrace:
Function entered at [<c00c52f0>] from [<c00c9594>]               

        -----c00c52f0 <__kfree_skb>:
r5 = 00000000  r4 = C16E9080
Function entered at

[<c00c950c>] from [<c0025d78>]                        -----c00c950c <net_tx_action>: (c00c9594)


r7 = C01554A0  r6 = FFFFFFFD  r5 = C01420A8  r4 = 00000001
Function entered at [<c0025cfc>]

from [<c00190d8>]                        -----c0025cfc <do_softirq>:
r8 = 60000013  r7 =

C00196B8  r6 = 00000018  r5 = C0141F64
r4 = C01554A0
Function entered at [<c0019074>]

from [<c00181c0>]                        -----c0019074 <asm_do_IRQ>:  (c00190d
r5 =

FEFFF000  r4 = C0141F98                                                                        

                                -----c0018c5c <get_irq_list>:  (c00181c0)
Function entered

at [<c001966c>] from [<c0019718>]                        -----c001966c <default_idle>:


Function entered at [<c00196c0>] from [<c0016030>]                        -----c00196c0

<cpu_idle>: (c001971
r5 = C016BFFC  r4 = C0156CE0
Function entered at [<c0016000>]

from [<c00086fc>]                        -----c0016000 <rest_init>:
Function entered at

[<c00085a8>] from [<c0008080>]                        -----c00085a8 <start_kernel>: (c00086fc:       

ea00000f         b        c0008740 <start_kernel+0x198>       
Code: e59f00b8 e51b1004 ebfd782e

e3a03000 (e5833000)         -----c0008080 <__mmap_switched>:
Kernel panic: Aiee, killing

interrupt handler!
In interrupt handler - not syncing


我检查了代码,也用objdump检查了出错信息,发现这两种错误都和/net/core/dev.c中的这段代码有关:
static void net_tx_action(struct softirq_action *h)
{
        int cpu = smp_processor_id();

        if

(softnet_data[cpu].completion_queue) {
                struct sk_buff *clist;

               

local_irq_disable();
                clist = softnet_data[cpu].completion_queue;
               

softnet_data[cpu].completion_queue = NULL;
                local_irq_enable();

               

while (clist != NULL) {
                        struct sk_buff *skb = clist;
                       

clist = clist->next;

                        BUG_TRAP(atomic_read(&skb->users) == 0);
               

        if(atomic_read(&skb->users) != 0)                                //2007-04-23
               

        {
                                print_skb_content(skb);
                        }
       

                else
                                __kfree_skb(skb);
                }
        }

       

if (softnet_data[cpu].output_queue) {
                struct net_device *head;

               

local_irq_disable();
                head = softnet_data[cpu].output_queue;
               

softnet_data[cpu].output_queue = NULL;
                local_irq_enable();

                while (head

!= NULL) {
                        struct net_device *dev = head;
                        head = head

->next_sched;

                        smp_mb__before_clear_bit();
                        clear_bit

(__LINK_STATE_SCHED, &dev->state);

                        if (spin_trylock(&dev->queue_lock))

{
                                qdisc_run(dev);
                                spin_unlock

(&dev->queue_lock);
                        } else {
                               

netif_schedule(dev);
                        }
                }
        }
}

反正不是BUG_TRAP(atomic_read(&skb->users) == 0);发现user数不为0(我打印出来发现居然为-2),

就是__kfree_skb(skb);时发现list不为null.
经常传了几G的数据时突然报个错,然后系统完全挂掉。但是若把ppp0关掉,或者只跑ppp0都还比较正常

,至少挂了好几天都没有问题。

我以及查了一个多星期了,但还是没有头绪。请牛人们帮忙分析一下。
谢谢,祝节日快乐!!
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP