免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
123下一页
最近访问板块 发新帖
查看: 6517 | 回复: 20
打印 上一主题 下一主题

[内核模块] 碰到一个死锁的情况,大侠们帮看看我的分析思路对吗?怎么越分析越糊涂 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2013-10-31 19:00 |只看该作者 |倒序浏览
本帖最后由 FlankerSky 于 2013-10-31 19:00 编辑

新手,没有深入的分析过死锁,以前碰到过类似嵌套的那种死锁,能看出来,这次这个真是死活都看不出来,下面是我的分析,各位大侠指点下:

1.基本情况描述:
红色的那部分是一个内核线程(kks_tx),它的作用是处理上层下来的数据包(上层下来的包是在netfilter的local_out处被我加到队列里去的),上层应用发下来的数据包,出于业务需要,有可能会被我写的内核模块直接响应,所以下面会调用netif_receive_skb给上层应用发送数据包。在tcp的入口tcp_v4_rcv这里是有一个自旋锁的,kks_tx在获得自旋锁后,继续执行,但是接下来我就看不懂了(红色上面的那部分),是被e1000e的中断给中断掉了吗?然后接着往下执行到tcp_v4_rcv,就会造成死锁???
2.但是我的代码是跑在vmware里的,不管我把vmware设置成单核还是多核都会出现这种情况,根据LDD3中说的,在非smp上,自旋锁不是没有任何作用吗?
而且在LDD3中也说,自旋锁期间是禁止本地CPU中断的啊,那这样就说不通了啊。

我是越分析越糊涂,大侠们帮我看看我的思路对不对啊?是不是方向错了?谢谢了!




[ 3182.546342] BUG: soft lockup - CPU#0 stuck for 22s! [KksTx:2742]
[ 3182.547942] Modules linked in: kksfilter(O) e1000e(O) vmhgfs(O) vsock(O) acpiphp vmwgfx ttm drm snd_ens1371 gameport snd_ac97_codec ac97_bus snd_pcm vmw_balloon snd_seq_midi snd_rawmidi snd_seq_midi_event psmouse snd_seq serio_raw snd_timer snd_seq_device snd joydev soundcore snd_page_alloc bnep rfcomm bluetooth parport_pc ppdev vmci(O) shpchp mac_hid i2c_piix4 lp parport usbhid hid floppy mptspi mptscsih mptbase vmxnet(O) vmw_pvscsi vmxnet3 [last unloaded: e1000e]
[ 3182.560267] irq event stamp: 0
[ 3182.561060] hardirqs last  enabled at (0): [<  (null)>]   (null)
[ 3182.562649] hardirqs last disabled at (0): [<c1058564>] copy_process+0x484/0x1170
[ 3182.564637] softirqs last  enabled at (0): [<c1058564>] copy_process+0x484/0x1170
[ 3182.566563] softirqs last disabled at (0): [<  (null)>]   (null)
[ 3182.568115] Modules linked in: kksfilter(O) e1000e(O) vmhgfs(O) vsock(O) acpiphp vmwgfx ttm drm snd_ens1371 gameport snd_ac97_codec ac97_bus snd_pcm vmw_balloon snd_seq_midi snd_rawmidi snd_seq_midi_event psmouse snd_seq serio_raw snd_timer snd_seq_device snd joydev soundcore snd_page_alloc bnep rfcomm bluetooth parport_pc ppdev vmci(O) shpchp mac_hid i2c_piix4 lp parport usbhid hid floppy mptspi mptscsih mptbase vmxnet(O) vmw_pvscsi vmxnet3 [last unloaded: e1000e]
[ 3182.580028]
[ 3182.580431] Pid: 2742, comm: KksTx Tainted: G           O 3.2.6 #18 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[ 3182.583631] EIP: 0060:[<c12ba6df>] EFLAGS: 00000282 CPU: 0
[ 3182.585043] EIP is at delay_tsc+0x1f/0x70
[ 3182.586068] EAX: 7170b643 EBX: f0a98d34 ECX: f0a98d34 EDX: 000007db
[ 3182.587670] ESI: 00000000 EDI: 00000001 EBP: f680dbb4 ESP: f680dba4
[ 3182.589252]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 3182.590632] Process KksTx (pid: 2742, ti=f680c000 task=e604ea80 task.ti=f4de6000)
[ 3182.592516] Stack:
[ 3182.593046]  7170b5d7 f0a98d34 280c08d0 00000000 f680dbbc c12ba62e f680dbdc c12c150b
[ 3182.595284]  00000001 a087b7e8 00000001 f0a98d34 c1e896c0 f0a98d00 f680dbf8 c159eefd
[ 3182.597516]  00000000 00000002 00000000 c14f6f2a f4c91600 f680dc4c c14f6f2a 00000050
[ 3182.599751] Call Trace:
[ 3182.600390]  [<c12ba62e>] __delay+0xe/0x10
[ 3182.601454]  [<c12c150b>] do_raw_spin_lock+0xab/0xf0
[ 3182.602706]  [<c159eefd>] _raw_spin_lock_nested+0x3d/0x50
[ 3182.604087]  [<c14f6f2a>] ? tcp_v4_rcv+0x76a/0xaa0
[ 3182.605285]  [<c14f6f2a>] tcp_v4_rcv+0x76a/0xaa0
[ 3182.606451]  [<c14d4b8f>] ip_local_deliver_finish+0xdf/0x380
[ 3182.607866]  [<c14d4aec>] ? ip_local_deliver_finish+0x3c/0x380
[ 3182.609318]  [<c14d4fbf>] ip_local_deliver+0x7f/0x90
[ 3182.610578]  [<c14d46be>] ip_rcv_finish+0x16e/0x560
[ 3182.611804]  [<c14d522a>] ip_rcv+0x25a/0x320
[ 3182.612880]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.614193]  [<c14a63d2>] __netif_receive_skb+0x4c2/0x570
[ 3182.615547]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.616918]  [<c14a707b>] netif_receive_skb+0xcb/0xe0
[ 3182.618187]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.619517]  [<f848e33e>] SendSkb2Middle+0x162/0x176 [kksfilter]
[ 3182.621019]  [<f848f7ac>] KksNatHandler+0x612/0x86d [kksfilter]
[ 3182.622516]  [<c12ba291>] ? sscanf+0x11/0x14
[ 3182.623592]  [<f84903a4>] ? kks_inet_addr+0x37/0x4f [kksfilter]
[ 3182.625069]  [<f848fad6>] kks_rx+0x4e/0xb9 [kksfilter]
[ 3182.626362]  [<f848fc52>] hook_local_in+0x111/0x1d7 [kksfilter]
[ 3182.627838]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.629145]  [<c14cd663>] nf_iterate+0x63/0x90
[ 3182.630274]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.631582]  [<c14cd722>] nf_hook_slow+0x92/0x150
[ 3182.632761]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.634069]  [<c14d521a>] ip_rcv+0x24a/0x320
[ 3182.635155]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.636465]  [<c14a63d2>] __netif_receive_skb+0x4c2/0x570
[ 3182.638624]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.639996]  [<c14a707b>] netif_receive_skb+0xcb/0xe0
[ 3182.641250]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.642564]  [<c14a71a7>] napi_skb_finish+0x37/0x50
[ 3182.643781]  [<c14a7701>] napi_gro_receive+0xa1/0xb0
[ 3182.645060]  [<f9d41726>] e1000_receive_skb+0xc6/0x170 [e1000e]
[ 3182.646577]  [<c149a19d>] ? __kfree_skb+0x3d/0x90
[ 3182.647779]  [<f9d43166>] e1000_clean_rx_irq+0x206/0x340 [e1000e]
[ 3182.649419]  [<f9d49b74>] e1000e_poll+0x64/0x2d0 [e1000e]
[ 3182.650768]  [<c14a790d>] net_rx_action+0x12d/0x240
[ 3182.651988]  [<c1060e30>] ? local_bh_enable+0xd0/0xd0
[ 3182.653245]  [<c1060ec9>] __do_softirq+0x99/0x1d0
[ 3182.654429]  [<c1060e30>] ? local_bh_enable+0xd0/0xd0
[ 3182.655688]  <IRQ>
[ 3182.656264]  [<c106123e>] ? irq_exit+0x7e/0xa0
[ 3182.657406]  [<c15a6f2b>] ? do_IRQ+0x4b/0xc0
[ 3182.658496]  [<c15a6d75>] ? common_interrupt+0x35/0x3c
[ 3182.659796]  [<c14f00d8>] ? tcp_connect+0x318/0x490
[ 3182.661011]  [<c14e6841>] ? tcp_validate_incoming+0x71/0x340
[ 3182.662421]  [<c14f8010>] ? tcp_check_req+0x260/0x4b0
[ 3182.663676]  [<c14ecaf7>] ? tcp_rcv_state_process+0x47/0xb80
[ 3182.665078]  [<c14f8093>] ? tcp_check_req+0x2e3/0x4b0
[ 3182.666338]  [<c14f7ced>] ? tcp_child_process+0x8d/0x150
[ 3182.667660]  [<c14f5d8a>] ? tcp_v4_do_rcv+0x2aa/0x3c0
[ 3182.668922]  [<c12c149b>] ? do_raw_spin_lock+0x3b/0xf0
[ 3182.670202]  [<c159eefd>] ? _raw_spin_lock_nested+0x3d/0x50
[ 3182.671603]  [<c14f6f4a>] ? tcp_v4_rcv+0x78a/0xaa0
[ 3182.672800]  [<c14d4b8f>] ? ip_local_deliver_finish+0xdf/0x380
[ 3182.674263]  [<c14d4aec>] ? ip_local_deliver_finish+0x3c/0x380
[ 3182.675713]  [<c14d4fbf>] ? ip_local_deliver+0x7f/0x90
[ 3182.676991]  [<c14d46be>] ? ip_rcv_finish+0x16e/0x560
[ 3182.678259]  [<c14d522a>] ? ip_rcv+0x25a/0x320
[ 3182.679371]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.680670]  [<c14a63d2>] ? __netif_receive_skb+0x4c2/0x570
[ 3182.682057]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.683436]  [<c14a707b>] ? netif_receive_skb+0xcb/0xe0
[ 3182.684730]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.686027]  [<f848e33e>] ? SendSkb2Middle+0x162/0x176 [kksfilter]
[ 3182.687568]  [<f848f867>] ? KksNatHandler+0x6cd/0x86d [kksfilter]
[ 3182.689094]  [<c1095000>] ? futex_wait_requeue_pi+0x1c0/0x380
[ 3182.690526]  [<f848fa67>] ? kks_tx+0x60/0x81 [kksfilter]
[ 3182.691859]  [<f848fa07>] ? KksNatHandler+0x86d/0x86d [kksfilter]
[ 3182.693386]  [<c1079df8>] ? kthread+0x78/0x80
[ 3182.694488]  [<c1079d80>] ? __init_kthread_worker+0x60/0x60
[ 3182.695876]  [<c15a6d82>] ? kernel_thread_helper+0x6/0x10

[ 3182.697220] Code: c3 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 04 3e 8d 74 26 00 64 8b 35 ec 5e 91 c1 89 c7 8d 76 00 0f ae e8 0f 31 <8d> 74 26 00 89 45 f0 eb 0d f3 90 64 8b 1d ec 5e 91 c1 39 de 75
[ 3182.704103] Call Trace:
[ 3182.704733]  [<c12ba62e>] __delay+0xe/0x10
[ 3182.705761]  [<c12c150b>] do_raw_spin_lock+0xab/0xf0
[ 3182.707016]  [<c159eefd>] _raw_spin_lock_nested+0x3d/0x50
[ 3182.708359]  [<c14f6f2a>] ? tcp_v4_rcv+0x76a/0xaa0
[ 3182.709555]  [<c14f6f2a>] tcp_v4_rcv+0x76a/0xaa0
[ 3182.710715]  [<c14d4b8f>] ip_local_deliver_finish+0xdf/0x380
[ 3182.712119]  [<c14d4aec>] ? ip_local_deliver_finish+0x3c/0x380
[ 3182.713565]  [<c14d4fbf>] ip_local_deliver+0x7f/0x90
[ 3182.714807]  [<c14d46be>] ip_rcv_finish+0x16e/0x560
[ 3182.716023]  [<c14d522a>] ip_rcv+0x25a/0x320
[ 3182.717090]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.718410]  [<c14a63d2>] __netif_receive_skb+0x4c2/0x570
[ 3182.719756]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.721120]  [<c14a707b>] netif_receive_skb+0xcb/0xe0
[ 3182.722391]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.723693]  [<f848e33e>] SendSkb2Middle+0x162/0x176 [kksfilter]
[ 3182.725185]  [<f848f7ac>] KksNatHandler+0x612/0x86d [kksfilter]
[ 3182.726665]  [<c12ba291>] ? sscanf+0x11/0x14
[ 3182.727788]  [<f84903a4>] ? kks_inet_addr+0x37/0x4f [kksfilter]
[ 3182.729257]  [<f848fad6>] kks_rx+0x4e/0xb9 [kksfilter]
[ 3182.730574]  [<f848fc52>] hook_local_in+0x111/0x1d7 [kksfilter]
[ 3182.732044]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.733344]  [<c14cd663>] nf_iterate+0x63/0x90
[ 3182.734457]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.735758]  [<c14cd722>] nf_hook_slow+0x92/0x150
[ 3182.736979]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.738293]  [<c14d521a>] ip_rcv+0x24a/0x320
[ 3182.739365]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.740660]  [<c14a63d2>] __netif_receive_skb+0x4c2/0x570
[ 3182.741997]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.743364]  [<c14a707b>] netif_receive_skb+0xcb/0xe0
[ 3182.744622]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.745921]  [<c14a71a7>] napi_skb_finish+0x37/0x50
[ 3182.747146]  [<c14a7701>] napi_gro_receive+0xa1/0xb0
[ 3182.748385]  [<f9d41726>] e1000_receive_skb+0xc6/0x170 [e1000e]
[ 3182.749856]  [<c149a19d>] ? __kfree_skb+0x3d/0x90
[ 3182.751043]  [<f9d43166>] e1000_clean_rx_irq+0x206/0x340 [e1000e]
[ 3182.752559]  [<f9d49b74>] e1000e_poll+0x64/0x2d0 [e1000e]
[ 3182.753901]  [<c14a790d>] net_rx_action+0x12d/0x240
[ 3182.755126]  [<c1060e30>] ? local_bh_enable+0xd0/0xd0
[ 3182.756384]  [<c1060ec9>] __do_softirq+0x99/0x1d0
[ 3182.757555]  [<c1060e30>] ? local_bh_enable+0xd0/0xd0
[ 3182.758823]  <IRQ>  [<c106123e>] ? irq_exit+0x7e/0xa0
[ 3182.760124]  [<c15a6f2b>] ? do_IRQ+0x4b/0xc0
[ 3182.761192]  [<c15a6d75>] ? common_interrupt+0x35/0x3c
[ 3182.762478]  [<c14f00d8>] ? tcp_connect+0x318/0x490
[ 3182.763694]  [<c14e6841>] ? tcp_validate_incoming+0x71/0x340
[ 3182.765101]  [<c14f8010>] ? tcp_check_req+0x260/0x4b0
[ 3182.766370]  [<c14ecaf7>] ? tcp_rcv_state_process+0x47/0xb80
[ 3182.767775]  [<c14f8093>] ? tcp_check_req+0x2e3/0x4b0
[ 3182.769032]  [<c14f7ced>] ? tcp_child_process+0x8d/0x150
[ 3182.770366]  [<c14f5d8a>] ? tcp_v4_do_rcv+0x2aa/0x3c0
[ 3182.771626]  [<c12c149b>] ? do_raw_spin_lock+0x3b/0xf0
[ 3182.772905]  [<c159eefd>] ? _raw_spin_lock_nested+0x3d/0x50
[ 3182.774297]  [<c14f6f4a>] ? tcp_v4_rcv+0x78a/0xaa0
[ 3182.775494]  [<c14d4b8f>] ? ip_local_deliver_finish+0xdf/0x380
[ 3182.776945]  [<c14d4aec>] ? ip_local_deliver_finish+0x3c/0x380
[ 3182.778409]  [<c14d4fbf>] ? ip_local_deliver+0x7f/0x90
[ 3182.779693]  [<c14d46be>] ? ip_rcv_finish+0x16e/0x560
[ 3182.780952]  [<c14d522a>] ? ip_rcv+0x25a/0x320
[ 3182.782064]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.783379]  [<c14a63d2>] ? __netif_receive_skb+0x4c2/0x570
[ 3182.784762]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.786125]  [<c14a707b>] ? netif_receive_skb+0xcb/0xe0
[ 3182.787536]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.788836]  [<f848e33e>] ? SendSkb2Middle+0x162/0x176 [kksfilter]
[ 3182.790382]  [<f848f867>] ? KksNatHandler+0x6cd/0x86d [kksfilter]
[ 3182.791890]  [<c1095000>] ? futex_wait_requeue_pi+0x1c0/0x380
[ 3182.793313]  [<f848fa67>] ? kks_tx+0x60/0x81 [kksfilter]
[ 3182.794645]  [<f848fa07>] ? KksNatHandler+0x86d/0x86d [kksfilter]
[ 3182.796155]  [<c1079df8>] ? kthread+0x78/0x80
[ 3182.797244]  [<c1079d80>] ? __init_kthread_worker+0x60/0x60
[ 3182.798634]  [<c15a6d82>] ? kernel_thread_helper+0x6/0x10
[ 3182.799988] Kernel panic - not syncing: softlockup: hung tasks
[ 3182.801440] Pid: 2742, comm: KksTx Tainted: G           O 3.2.6 #18
[ 3182.803002] Call Trace:
[ 3182.803629]  [<c1594f9f>] ? printk+0x1d/0x1f
[ 3182.804696]  [<c1594e75>] panic+0x5c/0x169
[ 3182.805746]  [<c10bd4d1>] watchdog_timer_fn+0x151/0x160
[ 3182.807060]  [<c107e28d>] __run_hrtimer+0x6d/0x1b0
[ 3182.808255]  [<c10bd380>] ? __touch_watchdog+0x20/0x20
[ 3182.809533]  [<c107ecd5>] hrtimer_interrupt+0xe5/0x260
[ 3182.810826]  [<c107ed51>] ? hrtimer_interrupt+0x161/0x260
[ 3182.812169]  [<c15a6ff4>] smp_apic_timer_interrupt+0x54/0x88
[ 3182.813575]  [<c12bb8d8>] ? trace_hardirqs_off_thunk+0xc/0x14
[ 3182.815016]  [<c159fe02>] apic_timer_interrupt+0x36/0x3c
[ 3182.816336]  [<c12ba6df>] ? delay_tsc+0x1f/0x70
[ 3182.817465]  [<c12ba62e>] __delay+0xe/0x10
[ 3182.818497]  [<c12c150b>] do_raw_spin_lock+0xab/0xf0
[ 3182.819733]  [<c159eefd>] _raw_spin_lock_nested+0x3d/0x50
[ 3182.821081]  [<c14f6f2a>] ? tcp_v4_rcv+0x76a/0xaa0
[ 3182.822287]  [<c14f6f2a>] tcp_v4_rcv+0x76a/0xaa0
[ 3182.823440]  [<c14d4b8f>] ip_local_deliver_finish+0xdf/0x380
[ 3182.824856]  [<c14d4aec>] ? ip_local_deliver_finish+0x3c/0x380
[ 3182.826314]  [<c14d4fbf>] ip_local_deliver+0x7f/0x90
[ 3182.827549]  [<c14d46be>] ip_rcv_finish+0x16e/0x560
[ 3182.828763]  [<c14d522a>] ip_rcv+0x25a/0x320
[ 3182.829831]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.831143]  [<c14a63d2>] __netif_receive_skb+0x4c2/0x570
[ 3182.832484]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.833848]  [<c14a707b>] netif_receive_skb+0xcb/0xe0
[ 3182.835108]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.836408]  [<f848e33e>] SendSkb2Middle+0x162/0x176 [kksfilter]
[ 3182.837899]  [<f848f7ac>] KksNatHandler+0x612/0x86d [kksfilter]
[ 3182.839384]  [<c12ba291>] ? sscanf+0x11/0x14
[ 3182.840453]  [<f84903a4>] ? kks_inet_addr+0x37/0x4f [kksfilter]
[ 3182.841913]  [<f848fad6>] kks_rx+0x4e/0xb9 [kksfilter]
[ 3182.843194]  [<f848fc52>] hook_local_in+0x111/0x1d7 [kksfilter]
[ 3182.844659]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.845954]  [<c14cd663>] nf_iterate+0x63/0x90
[ 3182.847065]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.848361]  [<c14cd722>] nf_hook_slow+0x92/0x150
[ 3182.849531]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.850840]  [<c14d521a>] ip_rcv+0x24a/0x320
[ 3182.851909]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.853209]  [<c14a63d2>] __netif_receive_skb+0x4c2/0x570
[ 3182.854561]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.855921]  [<c14a707b>] netif_receive_skb+0xcb/0xe0
[ 3182.857175]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.858483]  [<c14a71a7>] napi_skb_finish+0x37/0x50
[ 3182.859698]  [<c14a7701>] napi_gro_receive+0xa1/0xb0
[ 3182.860939]  [<f9d41726>] e1000_receive_skb+0xc6/0x170 [e1000e]
[ 3182.862420]  [<c149a19d>] ? __kfree_skb+0x3d/0x90
[ 3182.863594]  [<f9d43166>] e1000_clean_rx_irq+0x206/0x340 [e1000e]
[ 3182.865114]  [<f9d49b74>] e1000e_poll+0x64/0x2d0 [e1000e]
[ 3182.866474]  [<c14a790d>] net_rx_action+0x12d/0x240
[ 3182.867692]  [<c1060e30>] ? local_bh_enable+0xd0/0xd0
[ 3182.868950]  [<c1060ec9>] __do_softirq+0x99/0x1d0
[ 3182.870123]  [<c1060e30>] ? local_bh_enable+0xd0/0xd0
[ 3182.871397]  <IRQ>  [<c106123e>] ? irq_exit+0x7e/0xa0
[ 3182.872700]  [<c15a6f2b>] ? do_IRQ+0x4b/0xc0
[ 3182.873767]  [<c15a6d75>] ? common_interrupt+0x35/0x3c
[ 3182.875060]  [<c14f00d8>] ? tcp_connect+0x318/0x490
[ 3182.876276]  [<c14e6841>] ? tcp_validate_incoming+0x71/0x340
[ 3182.877682]  [<c14f8010>] ? tcp_check_req+0x260/0x4b0
[ 3182.878945]  [<c14ecaf7>] ? tcp_rcv_state_process+0x47/0xb80
[ 3182.880351]  [<c14f8093>] ? tcp_check_req+0x2e3/0x4b0
[ 3182.881613]  [<c14f7ced>] ? tcp_child_process+0x8d/0x150
[ 3182.882945]  [<c14f5d8a>] ? tcp_v4_do_rcv+0x2aa/0x3c0
[ 3182.884199]  [<c12c149b>] ? do_raw_spin_lock+0x3b/0xf0
[ 3182.885473]  [<c159eefd>] ? _raw_spin_lock_nested+0x3d/0x50
[ 3182.886867]  [<c14f6f4a>] ? tcp_v4_rcv+0x78a/0xaa0
[ 3182.888062]  [<c14d4b8f>] ? ip_local_deliver_finish+0xdf/0x380
[ 3182.889510]  [<c14d4aec>] ? ip_local_deliver_finish+0x3c/0x380
[ 3182.890966]  [<c14d4fbf>] ? ip_local_deliver+0x7f/0x90
[ 3182.892245]  [<c14d46be>] ? ip_rcv_finish+0x16e/0x560
[ 3182.893504]  [<c14d522a>] ? ip_rcv+0x25a/0x320
[ 3182.894626]  [<c14d4550>] ? inet_del_protocol+0x30/0x30
[ 3182.895925]  [<c14a63d2>] ? __netif_receive_skb+0x4c2/0x570
[ 3182.897308]  [<c14a5fec>] ? __netif_receive_skb+0xdc/0x570
[ 3182.898680]  [<c14a707b>] ? netif_receive_skb+0xcb/0xe0
[ 3182.899980]  [<c14a6fcf>] ? netif_receive_skb+0x1f/0xe0
[ 3182.901281]  [<f848e33e>] ? SendSkb2Middle+0x162/0x176 [kksfilter]
[ 3182.902820]  [<f848f867>] ? KksNatHandler+0x6cd/0x86d [kksfilter]
[ 3182.904330]  [<c1095000>] ? futex_wait_requeue_pi+0x1c0/0x380
[ 3182.905777]  [<f848fa67>] ? kks_tx+0x60/0x81 [kksfilter]
[ 3182.907106]  [<f848fa07>] ? KksNatHandler+0x86d/0x86d [kksfilter]
[ 3182.908614]  [<c1079df8>] ? kthread+0x78/0x80
[ 3182.909703]  [<c1079d80>] ? __init_kthread_worker+0x60/0x60
[ 3182.911102]  [<c15a6d82>] ? kernel_thread_helper+0x6/0x10

论坛徽章:
0
2 [报告]
发表于 2013-10-31 19:00 |只看该作者
求指点

论坛徽章:
0
3 [报告]
发表于 2013-11-01 08:40 |只看该作者
版主,求指点迷津!

论坛徽章:
0
4 [报告]
发表于 2013-11-01 09:13 |只看该作者
X86先把内核配置上CONFIG_STACKFRAME,不然回溯可能不准。
第一句的soft lockup,应该是spinlock关了抢占后一直不退出 ,导致这个核上优先级为99的fifo watchdog调度不到打印的回溯吧。

论坛徽章:
0
5 [报告]
发表于 2013-11-01 09:30 |只看该作者
应该是死锁。
1. 调用spin_lock就不能调用带休眠的函数了
2. 内核线程和中断共享数据用spin_lock_irq,和软中断用spin_lock_bh

论坛徽章:
15
射手座
日期:2014-02-26 13:45:082015年迎新春徽章
日期:2015-03-04 09:54:452015年辞旧岁徽章
日期:2015-03-03 16:54:15羊年新春福章
日期:2015-02-26 08:47:552015年亚洲杯之卡塔尔
日期:2015-02-03 08:33:45射手座
日期:2014-12-31 08:36:51水瓶座
日期:2014-06-04 08:33:52天蝎座
日期:2014-05-14 14:30:41天秤座
日期:2014-04-21 08:37:08处女座
日期:2014-04-18 16:57:05戌狗
日期:2014-04-04 12:21:33技术图书徽章
日期:2014-03-25 09:00:29
6 [报告]
发表于 2013-11-01 09:46 |只看该作者
FlankerSky 发表于 2013-10-31 19:00
新手,没有深入的分析过死锁,以前碰到过类似嵌套的那种死锁,能看出来,这次这个真是死活都看不出来,下面 ...

个人分析,做参考:
死锁原因为:内核线程(kks_tx)直接调用了netif_receive_skb最终在tcp_v4_rcv处需要获取spinlock,在获取到spinlock后,来了一个中断(可能是时钟中断或是其他的),按中断正常处理逻辑,在中断结束后会进入软中断,而网络收包软中断最终也会调用netif_receive_skb,导致该函数重入,最终也会死等同一把spinlock,而导致死锁,最终导致softlockup。
死锁根本原因应该在于内核线程(kks_tx)不应该直接调用netif_receive_skb函数(应该有一定的保护措施),该函数应该是不可安全重入的。
另外,再解释下楼主的疑问:
“2.但是我的代码是跑在vmware里的,不管我把vmware设置成单核还是多核都会出现这种情况,根据LDD3中说的,在非smp上,自旋锁不是没有任何作用吗?
而且在LDD3中也说,自旋锁期间是禁止本地CPU中断的啊,那这样就说不通了啊。”
在这种情况下,单核同样是会死锁的,自旋锁不起作用并不代表使用自旋锁后不会死锁,当临界资源在内核态,而且在考虑中断的情况下,单核情况下的互斥保护应该也是需要的。
“自旋锁期间是禁止本地CPU中断”的说法也是不准确的,仅有部分spinlock接口禁用中断(比如spin_lock_irq),普通的spin_lock接口是不禁用中断的(通常只禁用抢占)。
而内核自身的软中断机制是保证了其自身是不可重入的,因为在do_softirq执行的开始就判断了当前是否处于中断上下文(包括软中断),如果是的话,会直接退出,所以其自身不会导致netif_receive_skb的重入。

论坛徽章:
0
7 [报告]
发表于 2013-11-01 10:56 |只看该作者
回复 4# chenyu105

谢谢,你说的回溯的配置项是CONFIG_STACKFRAME这个吗?我怎么在.config文件里没有找到啊?

论坛徽章:
0
8 [报告]
发表于 2013-11-01 10:58 |只看该作者
回复 5# whaaat

恩恩,谢谢。这个出错的锁是系统定义的,,所以,我想可能是函数调用不合理造成的,
   

论坛徽章:
0
9 [报告]
发表于 2013-11-01 11:02 |只看该作者
回复 6# humjb_1983

谢谢,看了你的回复顿时明白了很多,也看了下do_softirq的实现。
现在还想请教下,是不是模仿do_softirq函数,在我自己调用 netif_receive_skb的地方,检查下是不是有中断,没有的话,禁用所有中断之后再调用netif_receive_skb是不是就可以了?
   

论坛徽章:
15
射手座
日期:2014-02-26 13:45:082015年迎新春徽章
日期:2015-03-04 09:54:452015年辞旧岁徽章
日期:2015-03-03 16:54:15羊年新春福章
日期:2015-02-26 08:47:552015年亚洲杯之卡塔尔
日期:2015-02-03 08:33:45射手座
日期:2014-12-31 08:36:51水瓶座
日期:2014-06-04 08:33:52天蝎座
日期:2014-05-14 14:30:41天秤座
日期:2014-04-21 08:37:08处女座
日期:2014-04-18 16:57:05戌狗
日期:2014-04-04 12:21:33技术图书徽章
日期:2014-03-25 09:00:29
10 [报告]
发表于 2013-11-01 11:51 |只看该作者
FlankerSky 发表于 2013-11-01 11:02
回复 6# humjb_1983

谢谢,看了你的回复顿时明白了很多,也看了下do_softirq的实现。

这样应该可以解决重入死锁问题,但禁用中断会导致效率问题,同时如果禁用时间稍长,可能会导致其他问题,需要仔细考虑和验证。
也可以尝试不直接调用netif_receive_skb,而通过其他方式实现~
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP