免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1361 | 回复: 1
打印 上一主题 下一主题

[网络子系统] 求破:dnsmasq 引起的DEADLOCK ??? [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2015-07-24 09:36 |只看该作者 |倒序浏览
本帖最后由 yoojoin 于 2015-07-24 09:37 编辑

系统openwrt 12.09 adjust, x86, 拨号上网,容易出下面日志,但还能正常上网,dhcp没有出现过。

[62668.659168] ======================================================
[62668.666461] [ INFO: possible circular locking dependency detected ]
[62668.673871] 3.3.8 #40 Tainted: G           O
[62668.679263] -------------------------------------------------------
[62668.686640] dnsmasq/2160 is trying to acquire lock:
[62668.692656]  (&(&sch->busylock)->rlock){+.-...}, at: [<812dfccc>] dev_queue_xmit+0x2bc/0x6d0
[62668.702334]
[62668.702334] but task is already holding lock:
[62668.710399]  (&(&pch->downl)->rlock){+.-...}, at: [<f8c1a05d>] ppp_unregister_channel+0x121d/0x1e10 [ppp_generic]
[62668.721966]
[62668.721966] which lock already depends on the new lock.
[62668.721967]
[62668.733587]
[62668.733588] the existing dependency chain (in reverse order) is:
[62668.743358]
[62668.743358] -> #3 (&(&pch->downl)->rlock){+.-...}:
[62668.751962]        [<8105d1cc>] lock_acquire+0x6c/0x90
[62668.758238]        [<813612d8>] _raw_spin_lock_bh+0x38/0x50
[62668.764966]        [<f8c1a05d>] ppp_unregister_channel+0x121d/0x1e10 [ppp_generic]
[62668.773700]        [<f8c1a995>] ppp_unregister_channel+0x1b55/0x1e10 [ppp_generic]
[62668.782480]        [<f8c1ad15>] ppp_output_wakeup+0xc5/0xe80 [ppp_generic]
[62668.790498]        [<810a5ab4>] vfs_write+0x94/0x130
[62668.796561]        [<810a5d28>] sys_write+0x38/0x70
[62668.802609]        [<81361931>] syscall_call+0x7/0xb
[62668.808685]
[62668.808685] -> #2 (&(&ppp->wlock)->rlock){+.-...}:
[62668.817270]        [<8105d1cc>] lock_acquire+0x6c/0x90
[62668.823523]        [<813612d8>] _raw_spin_lock_bh+0x38/0x50
[62668.830187]        [<f8c1a549>] ppp_unregister_channel+0x1709/0x1e10 [ppp_generic]
[62668.838876]        [<f8c1ab85>] ppp_unregister_channel+0x1d45/0x1e10 [ppp_generic]
[62668.847534]        [<812df91b>] dev_hard_start_xmit+0x43b/0x530
[62668.854542]        [<812f1be6>] sch_direct_xmit+0x56/0x200
[62668.861054]        [<812dfdea>] dev_queue_xmit+0x3da/0x6d0
[62668.867597]        [<812e5fca>] neigh_direct_output+0xa/0x10
[62668.874310]        [<81306a1a>] ip_finish_output+0x30a/0x3c0
[62668.881001]        [<81307e8f>] ip_output+0xdf/0x110
[62668.886991]        [<81304443>] ip_forward_finish+0x93/0xd0
[62668.893524]        [<813046f7>] ip_forward+0x277/0x350
[62668.899629]        [<81302d13>] ip_rcv_finish+0x3f3/0x490
[62668.905982]        [<8130333f>] ip_rcv+0x29f/0x2f0
[62668.911687]        [<812ddf56>] __netif_receive_skb+0x3a6/0x410
[62668.918530]        [<812de1cf>] netif_receive_skb+0xbf/0xd0
[62668.925009]        [<81348ab3>] br_handle_frame_finish+0x173/0x1a0
[62668.932026]        [<81348c87>] br_handle_frame+0x1a7/0x1d0
[62668.938432]        [<812dde8e>] __netif_receive_skb+0x2de/0x410
[62668.945123]        [<812de1cf>] netif_receive_skb+0xbf/0xd0
[62668.951501]        [<812debd5>] napi_skb_finish+0x25/0x50
[62668.957667]        [<812decf3>] napi_gro_receive+0xf3/0x100
[62668.964024]        [<f9241fba>] e1000e_set_ethtool_ops+0xd0a/0x18b0 [e1000e]
[62668.971821]        [<f9243556>] e1000e_get_stats64+0x9f6/0x2d40 [e1000e]
[62668.979269]        [<f9244945>] e1000e_get_stats64+0x1de5/0x2d40 [e1000e]
[62668.986857]        [<812de3c5>] net_rx_action+0x65/0x140
[62668.992904]        [<8102c451>] __do_softirq+0x81/0x130
[62668.998876]
[62668.998877] -> #1 (_xmit_PPP#2){+.-...}:
[62669.005779]        [<8105d1cc>] lock_acquire+0x6c/0x90
[62669.011625]        [<81361023>] _raw_spin_lock+0x33/0x40
[62669.017642]        [<812f1bc6>] sch_direct_xmit+0x36/0x200
[62669.023832]        [<812dfdea>] dev_queue_xmit+0x3da/0x6d0
[62669.030049]        [<812e5fca>] neigh_direct_output+0xa/0x10
[62669.036426]        [<81306a1a>] ip_finish_output+0x30a/0x3c0
[62669.042794]        [<81307e8f>] ip_output+0xdf/0x110
[62669.048471]        [<813075a3>] ip_local_out+0x43/0x60
[62669.054295]        [<813086e9>] ip_send_skb+0x9/0x60
[62669.059964]        [<8132466d>] udp_send_skb+0x25d/0x2d0
[62669.065963]        [<81326126>] udp_sendmsg+0x576/0x840
[62669.071848]        [<8132d0f0>] inet_sendmsg+0xc0/0xd0
[62669.077669]        [<812cca74>] sock_sendmsg+0xa4/0xc0
[62669.083457]        [<812ced62>] sys_sendto+0xb2/0xe0
[62669.089087]        [<812cf5e0>] sys_socketcall+0x160/0x2b0
[62669.095239]        [<81361931>] syscall_call+0x7/0xb
[62669.100864]
[62669.100864] -> #0 (&(&sch->busylock)->rlock){+.-...}:
[62669.108772]        [<8105c795>] __lock_acquire+0xf55/0x1570
[62669.115007]        [<8105d1cc>] lock_acquire+0x6c/0x90
[62669.120829]        [<81361023>] _raw_spin_lock+0x33/0x40
[62669.126797]        [<812dfccc>] dev_queue_xmit+0x2bc/0x6d0
[62669.132951]        [<f9043493>] 0xf9043493
[62669.137680]        [<f90434ca>] 0xf90434ca
[62669.142408]        [<f8c1a06c>] ppp_unregister_channel+0x122c/0x1e10 [ppp_generic]
[62669.144361]        [<f8c1a995>] ppp_unregister_channel+0x1b55/0x1e10 [ppp_generic]
[62669.144364]        [<f8c1ab85>] ppp_unregister_channel+0x1d45/0x1e10 [ppp_generic]
[62669.144366]        [<812df91b>] dev_hard_start_xmit+0x43b/0x530
[62669.144367]        [<812f1be6>] sch_direct_xmit+0x56/0x200
[62669.144369]        [<812dfdea>] dev_queue_xmit+0x3da/0x6d0
[62669.144370]        [<812e5fca>] neigh_direct_output+0xa/0x10
[62669.144372]        [<81306a1a>] ip_finish_output+0x30a/0x3c0
[62669.144374]        [<81307e8f>] ip_output+0xdf/0x110
[62669.144376]        [<813075a3>] ip_local_out+0x43/0x60
[62669.144377]        [<813086e9>] ip_send_skb+0x9/0x60
[62669.144379]        [<8132466d>] udp_send_skb+0x25d/0x2d0
[62669.144380]        [<81326126>] udp_sendmsg+0x576/0x840
[62669.144382]        [<8132d0f0>] inet_sendmsg+0xc0/0xd0
[62669.144383]        [<812cca74>] sock_sendmsg+0xa4/0xc0
[62669.144385]        [<812ced62>] sys_sendto+0xb2/0xe0
[62669.144387]        [<812cf5e0>] sys_socketcall+0x160/0x2b0
[62669.144388]        [<81361931>] syscall_call+0x7/0xb
[62669.144390]
[62669.144391] other info that might help us debug this:
[62669.144391]
[62669.144392] Chain exists of:
[62669.144392]   &(&sch->busylock)->rlock --> &(&ppp->wlock)->rlock --> &(&pch->downl)->rlock
[62669.144394]
[62669.144394]  Possible unsafe locking scenario:
[62669.144395]
[62669.144395]        CPU0                    CPU1
[62669.144396]        ----                    ----
[62669.144396]   lock(&(&pch->downl)->rlock);
[62669.144397]                                lock(&(&ppp->wlock)->rlock);
[62669.144398]                                lock(&(&pch->downl)->rlock);
[62669.144399]   lock(&(&sch->busylock)->rlock);
[62669.144400]
[62669.144401]  *** DEADLOCK ***
[62669.144401]
[62669.144402] 6 locks held by dnsmasq/2160:
[62669.144403]  #0:  (rcu_read_lock){.+.+..}, at: [<81306960>] ip_finish_output+0x250/0x3c0
[62669.144405]  #1:  (rcu_read_lock_bh){.+....}, at: [<812dfa10>] dev_queue_xmit+0x0/0x6d0
[62669.144408]  #2:  (_xmit_PPP#2){+.-...}, at: [<812f1bc6>] sch_direct_xmit+0x36/0x200
[62669.144410]  #3:  (&(&ppp->wlock)->rlock){+.-...}, at: [<f8c1a549>] ppp_unregister_channel+0x1709/0x1e10 [ppp_generic]
[62669.144413]  #4:  (&(&pch->downl)->rlock){+.-...}, at: [<f8c1a05d>] ppp_unregister_channel+0x121d/0x1e10 [ppp_generic]
[62669.144416]  #5:  (rcu_read_lock_bh){.+....}, at: [<812dfa10>] dev_queue_xmit+0x0/0x6d0
[62669.144418]
[62669.144419] stack backtrace:
[62669.144420] Pid: 2160, comm: dnsmasq Tainted: G           O 3.3.8 #40
[62669.144421] Call Trace:
[62669.144424]  [<8135b0a8>] ? printk+0x18/0x1a
[62669.144426]  [<8135b8f4>] print_circular_bug+0x234/0x23e
[62669.144428]  [<8105c795>] __lock_acquire+0xf55/0x1570
[62669.144430]  [<8105d1cc>] lock_acquire+0x6c/0x90
[62669.144431]  [<812dfccc>] ? dev_queue_xmit+0x2bc/0x6d0
[62669.144432]  [<81361023>] _raw_spin_lock+0x33/0x40
[62669.144434]  [<812dfccc>] ? dev_queue_xmit+0x2bc/0x6d0
[62669.144435]  [<812dfccc>] dev_queue_xmit+0x2bc/0x6d0
[62669.144436]  [<812dfa10>] ? dev_hard_start_xmit+0x530/0x530
[62669.144438]  [<f9043493>] 0xf9043492
[62669.144440]  [<812f0ea0>] ? eth_rebuild_header+0x70/0x70
[62669.144442]  [<f90434ca>] 0xf90434c9
[62669.144445]  [<f8c1a06c>] ppp_unregister_channel+0x122c/0x1e10 [ppp_generic]
[62669.144446]  [<8105d818>] ? mark_held_locks+0xb8/0xf0
[62669.144448]  [<813616b5>] ? _raw_spin_unlock_irqrestore+0x45/0x60
[62669.144450]  [<8105d96c>] ? trace_hardirqs_on_caller+0x11c/0x1a0
[62669.144451]  [<8105d9fb>] ? trace_hardirqs_on+0xb/0x10
[62669.144453]  [<812d4669>] ? skb_dequeue+0x59/0x70
[62669.144456]  [<f8c1a995>] ppp_unregister_channel+0x1b55/0x1e10 [ppp_generic]
[62669.144457]  [<812d4417>] ? skb_queue_tail+0x37/0x50
[62669.144460]  [<f8c1ab85>] ppp_unregister_channel+0x1d45/0x1e10 [ppp_generic]
[62669.144461]  [<812df91b>] dev_hard_start_xmit+0x43b/0x530
[62669.144463]  [<812f1bc6>] ? sch_direct_xmit+0x36/0x200
[62669.144465]  [<812f1be6>] sch_direct_xmit+0x56/0x200
[62669.144467]  [<8136102a>] ? _raw_spin_lock+0x3a/0x40
[62669.144468]  [<812dfdea>] dev_queue_xmit+0x3da/0x6d0
[62669.144470]  [<812dfa10>] ? dev_hard_start_xmit+0x530/0x530
[62669.144471]  [<81306960>] ? ip_finish_output+0x250/0x3c0
[62669.144473]  [<812e5fca>] neigh_direct_output+0xa/0x10
[62669.144475]  [<81306a1a>] ip_finish_output+0x30a/0x3c0
[62669.144477]  [<81306960>] ? ip_finish_output+0x250/0x3c0
[62669.144479]  [<81307e8f>] ip_output+0xdf/0x110
[62669.144480]  [<81306710>] ? ip_fragment+0x850/0x850
[62669.144482]  [<813075a3>] ip_local_out+0x43/0x60
[62669.144484]  [<813086e9>] ip_send_skb+0x9/0x60
[62669.144485]  [<8132466d>] udp_send_skb+0x25d/0x2d0
[62669.144487]  [<81326126>] udp_sendmsg+0x576/0x840
[62669.144489]  [<81305d40>] ? ip_setup_cork+0xf0/0xf0
[62669.144491]  [<8132d0f0>] inet_sendmsg+0xc0/0xd0
[62669.144492]  [<8132d030>] ? inet_autobind+0x60/0x60
[62669.144494]  [<812cca74>] sock_sendmsg+0xa4/0xc0
[62669.144497]  [<8108e4ae>] ? might_fault+0x2e/0x80
[62669.144498]  [<8108e4ae>] ? might_fault+0x2e/0x80
[62669.144500]  [<8108e4f4>] ? might_fault+0x74/0x80
[62669.144503]  [<8116022d>] ? _copy_from_user+0x3d/0x60
[62669.144504]  [<812ce6ce>] ? move_addr_to_kernel+0x2e/0x50
[62669.144506]  [<812ced62>] sys_sendto+0xb2/0xe0
[62669.144508]  [<8105ce3d>] ? lock_release_non_nested+0x8d/0x250
[62669.144510]  [<8108e4ae>] ? might_fault+0x2e/0x80
[62669.144511]  [<8108e4ae>] ? might_fault+0x2e/0x80
[62669.144513]  [<812cf5e0>] sys_socketcall+0x160/0x2b0
[62669.144515]  [<81361931>] syscall_call+0x7/0xb
[62669.144516]  [<81360000>] ? wait_for_common+0x30/0x110

论坛徽章:
20
程序设计版块每日发帖之星
日期:2015-08-17 06:20:00程序设计版块每日发帖之星
日期:2016-07-16 06:20:00程序设计版块每日发帖之星
日期:2016-07-18 06:20:00每日论坛发贴之星
日期:2016-07-18 06:20:00黑曼巴
日期:2016-12-26 16:00:3215-16赛季CBA联赛之江苏
日期:2017-06-26 11:05:5615-16赛季CBA联赛之上海
日期:2017-07-21 18:12:5015-16赛季CBA联赛之青岛
日期:2017-09-04 17:32:0515-16赛季CBA联赛之吉林
日期:2018-03-26 10:02:16程序设计版块每日发帖之星
日期:2016-07-15 06:20:0015-16赛季CBA联赛之江苏
日期:2016-07-07 18:37:512015亚冠之萨济拖拉机
日期:2015-08-17 12:21:08
2 [报告]
发表于 2015-07-24 11:23 |只看该作者
这只是lockdep检测出lock的依赖关系上出现了循环,并不一定真正的出现了deadlock。(即false positive)

假设ppp0绑定eth0
dev_queue_xmit(ip报文,ppp0) -> ppp的qdisc_lock -> ppp的内部lock -> pppoe的逻辑 -> dev_queue_xmit(pppoe报,eth0) -> eth0的qdisc_lock

这里产生了dev_queue_xmit的嵌套调用,这个场景在虚拟网卡的实现里很普遍(例如bond、bridge、GRE等)。

lockdep是按照lock的class来检测lock dependency loop的,而不是按照lock instance来检查。
比如说,假设每个inode有个inode_lock,每个dentry有个dentry_lock,这是两种lock(两个class),
系统中有N多inode和dentry,也就有N多inode_lock/dentry_lock的实例。
lockdep关心的是inode-lock和dentry-lock这两个class之间的先后顺序,它不关心(也无法关心)instance之间的依赖(种类是有限的,实例就太多了)。

由于lockdep是面向lock class的,lock inode#1 + lock inode#2这样的操作都会被认为是dead lock。
————仔细想想也是有道理的,谁知道其它地方有没有lock inode#2 + lock inode#1的操作呢。

但是在ppp dev_queue_xmit的场景里,ppp0->eth0的顺序是合理的,不会出现eth0->ppp0的可能性。
因此lock ppp.qdisc_lock -> lock eth.qdisc_lock是不会产生死锁的。
不幸的是,lockdep没有这些先验知识,它观察到两个qdisck_lock class,于是产生了报警。

PS,ppp->dev_hard_start_xmit调用了ppp_unregister_channel? 这不太可能,应该是地址到函数的映射有问题?
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP