免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 5297 | 回复: 11
打印 上一主题 下一主题

[文件系统] bdi 的bug? [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2013-02-07 17:37 |只看该作者 |倒序浏览
本帖最后由 tacoee 于 2013-02-22 17:40 编辑

[18818.382402] EXT3-fs (sdc7): I/O error while writing superblock
[18818.430385] EXT3-fs (sdc: I/O error while writing superblock
[18818.490353] EXT3-fs (sdc9): I/O error while writing superblock
[18818.546321] EXT3-fs (sdc10): I/O error while writing superblock
[18818.610497] EXT3-fs (sdc11): I/O error while writing superblock
[18818.674213] EXT3-fs (sdc12): I/O error while writing superblock
[18818.730048] EXT3-fs (sdc13): I/O error while writing superblock
[18818.794154] EXT3-fs (sdc14): I/O error while writing superblock
[18821.855453] BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
[18821.855466] IP: [<ffffffff8104e564>] try_to_wake_up+0x54/0x2a0
[18821.855481] PGD 65270f067 PUD 618ae5067 PMD 0
[18821.855490] Oops: 0000 [#1] SMP
[18821.855508] CPU 1
[18821.855509] Modules linked in: vdl(N-) edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod be2net i2c_i801 i2c_core sg tg3 ipv6 ipv6_lib mptctl wmi mptbase container button rtc_cmos pcspkr iTCO_wdt iTCO_vendor_support ext3 jbd mbcache ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac scsi_dh_alua scsi_dh ahci libahci libata lpfc scsi_transport_fc scsi_tgt mpt2sas scsi_transport_sas raid_class scsi_mod [last unloaded: vdl]
[18821.855543] Supported: Yes
[18821.855545]
[18821.855547] Pid: 88944, comm: bash Tainted: G           N  3.0.13-0.27-default #1 XXXXXXXXXXXXXXXXXXXXXXXXX
[18821.855553] RIP: 0010:[<ffffffff8104e564>]  [<ffffffff8104e564>] try_to_wake_up+0x54/0x2a0
[18821.855572] RSP: 0000:ffff88067e623de0  EFLAGS: 00010006
[18821.855577] RAX: ffffffffffffffff RBX: 000000000000000f RCX: 00000000ffffffff
[18821.855583] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff880613cf2d40
[18821.855589] RBP: ffff88067e623e20 R08: dead000000200200 R09: 0000000000000000
[18821.855595] R10: 0000000000000400 R11: ffffffff8101ff60 R12: ffff880613cf2580
[18821.855600] R13: 0000000000000000 R14: ffffffff8110f9b0 R15: 0000000000000000
[18821.855607] FS:  00007fbb7aa72700(0000) GS:ffff88067e620000(0000) knlGS:0000000000000000
[18821.855614] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18821.855619] CR2: 0000000000000017 CR3: 000000061e945000 CR4: 00000000000406e0
[18821.855625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18821.855631] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[18821.855637] Process bash (pid: 88944, threadinfo ffff8805f93ba000, task ffff8805cfb4e200)
[18821.855642] Stack:
[18821.855646]  ffff880613cf2d40 0000000000000246 ffff88067e634858 ffff880650277170
[18821.855656]  ffff8806502771f8 ffff880650277068 ffffffff8110f9b0 ffff880650277068
[18821.855665]  ffff880650277068 ffffffff8110fa11 ffff880650277170 ffff880650277170
[18821.855675] Call Trace:
[18821.855693]  [<ffffffff8110fa11>] wakeup_timer_fn+0x61/0xb0
[18821.855705]  [<ffffffff810686cb>] call_timer_fn+0x6b/0x120
[18821.855716]  [<ffffffff81069ad3>] run_timer_softirq+0x173/0x240
[18821.855728]  [<ffffffff81060a1f>] __do_softirq+0xef/0x220
[18821.855740]  [<ffffffff8144a8bc>] call_softirq+0x1c/0x30
[18821.857221] DWARF2 unwinder stuck at call_softirq+0x1c/0x30
[18821.857226]
[18821.857229] Leftover inexact backtrace:
[18821.857231]
[18821.857236]  <IRQ>
[18821.857247]  [<ffffffff810043b5>] ? do_softirq+0x65/0xa0
[18821.857255]  [<ffffffff81060815>] ? irq_exit+0xc5/0xe0
[18821.857264]  [<ffffffff81020d58>] ? smp_apic_timer_interrupt+0x68/0xa0
[18821.857272]  [<ffffffff8144a073>] ? apic_timer_interrupt+0x13/0x20
[18821.857277]  <EOI>
[18821.857280] Code: 00 00 45 31 ed 48 89 c7 48 89 45 c0 e8 56 34 3f 00 48 89 45 c8 49 8b 04 24 85 c3 0f 84 e5 01 00 00 49 8b 44 24 08 41 8b 4c 24 2c <8b> 40 18 85 c9 89 45 d4 74 64 49 c7 c6 00 16 01 00 89 c0 4d 89
[18821.857332] RIP  [<ffffffff8104e564>] try_to_wake_up+0x54/0x2a0
[18821.857341]  RSP <ffff88067e623de0>
[18821.857345] CR2: 0000000000000017

flusher在唤醒的时候被kill 了?

论坛徽章:
0
2 [报告]
发表于 2013-02-07 17:51 |只看该作者
bdi_forker_thread 函数里
428             /*
429              * If there is no work to do and the bdi thread was
430              * inactive long enough - kill it. The wb_lock is taken
431              * to make sure no-one adds more work to this bdi and
432              * wakes the bdi thread up.
433              */
434             if (bdi->wb.task && !have_dirty_io &&   --->还没有脏页需要flush
435                 time_after(jiffies, bdi->wb.last_active +
436                         bdi_longest_inactive())) {
437                 task = bdi->wb.task;                 ---> 这时刚好有脏页需要fush 到硬盘  ---> 别的地方唤醒wb.task 进程。
438                 bdi->wb.task = NULL;      ---> 这时清空, 使得进入try_to_wake_up函数里的p 为空指针
439                 spin_unlock(&bdi->wb_lock);
440                 set_bit(BDI_pending, &bdi->state);
441                 action = KILL_THREAD;
442                 break;
443             }


2839 static int
2840 try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)   这里的p 为 bdi->wb.task
2841 {
2842     unsigned long flags;
2843     int cpu, success = 0;
2844
2845     smp_wmb();
2846     raw_spin_lock_irqsave(&p->pi_lock, flags);
2847     if (!(p->state & state))
2848         goto out;
2849
2850     success = 1; /* we're going to change ->state */
2851     cpu = task_cpu(p);
2852
2853     if (p->on_rq && ttwu_remote(p, wake_flags))
2854         goto stat;
2855

论坛徽章:
0
3 [报告]
发表于 2013-02-07 17:57 |只看该作者
315 static void wakeup_timer_fn(unsigned long data)
316 {
317     struct backing_dev_info *bdi = (struct backing_dev_info *)data;
318
319     spin_lock_bh(&bdi->wb_lock);   这里有个wb_lock 互斥锁 不能保证与bdi_forker_thread 线程的
320     if (bdi->wb.task) {
321         trace_writeback_wake_thread(bdi);
322         wake_up_process(bdi->wb.task);
323     } else {
324         /*
325          * When bdi tasks are inactive for long time, they are killed.
326          * In this case we have to wake-up the forker thread which
327          * should create and run the bdi thread.
328          */
329         trace_writeback_wake_forker_thread(bdi);
330         wake_up_process(default_backing_dev_info.wb.task);
331     }
332     spin_unlock_bh(&bdi->wb_lock);
333 }
334

426             spin_lock(&bdi->wb_lock);  -- 这个互斥锁
427
428             /*
429              * If there is no work to do and the bdi thread was
430              * inactive long enough - kill it. The wb_lock is taken
431              * to make sure no-one adds more work to this bdi and
432              * wakes the bdi thread up.
433              */
434             if (bdi->wb.task && !have_dirty_io &&
435                 time_after(jiffies, bdi->wb.last_active +
436                         bdi_longest_inactive())) {
437                 task = bdi->wb.task;
438                 bdi->wb.task = NULL;
439                 spin_unlock(&bdi->wb_lock);
440                 set_bit(BDI_pending, &bdi->state);
441                 action = KILL_THREAD;
442                 break;
443             }
444             spin_unlock(&bdi->wb_lock);
445         }

论坛徽章:
0
4 [报告]
发表于 2013-02-08 10:32 |只看该作者
spin_unlock_bh  的作用是会禁软件中断,那么在forker线程里获取到该锁后,
timer 实质是一个软件中断,那么如果forker 线程里获取到这个lock后, 
timer[wakeup_timer_fn]是不可能中断执行的,所以2楼里描述的情况不可能出现。

那么是哪里调用了try_to_wake_up 呢?

本来怀疑的路径是:
bdi 的回写定时器->wakeup_timer_fn->wakeup_timer_fn->try_to_wake_up

这个问题出现的场景是在unmount的时候拔盘。
umount的时候要做sync 同步cache到硬盘,即把脏页回写到硬盘? 
但在do_umount函数里没看到相关同步操作的代码。

那么另一个路径是?



论坛徽章:
0
5 [报告]
发表于 2013-02-08 11:22 |只看该作者
拔盘的时候:
invalidate_partition->fsync_bdev->sync_filesystem->__sync_filesystem->sync_inodes_sb->bdi_queue_work

102 static void bdi_queue_work(struct backing_dev_info *bdi,
103                struct wb_writeback_work *work)
104 {
105     trace_writeback_queue(bdi, work);
106
107     spin_lock_bh(&bdi->wb_lock);                    -----> 这里也获取了互斥锁呀,所以这个路径也不会出现这个问题。
108     list_add_tail(&work->list, &bdi->work_list);
109     if (!bdi->wb.task)
110         trace_writeback_nothread(bdi, work);
111       bdi_wakeup_flusher(bdi);
112     spin_unlock_bh(&bdi->wb_lock);
113 }

  88 /* Wakeup flusher thread or forker thread to fork it. Requires bdi->wb_lock. */        ---〉 这里说明了一定要获取这个锁的。
  89 static void bdi_wakeup_flusher(struct backing_dev_info *bdi)
  90 {
  91     if (bdi->wb.task) {
  92         wake_up_process(bdi->wb.task);
  93     } else {
  94         /*
  95          * The bdi thread isn't there, wake up the forker thread which
  96          * will create and run it.
  97          */
  98         wake_up_process(default_backing_dev_info.wb.task);
  99     }
100 }

还有一个函数调用了上面这个函数:

bdi_start_background_writeback-〉bdi_wakeup_flusher
157 /**
158  * bdi_start_background_writeback - start background writeback
159  * @bdi: the backing device to write from
160  *
161  * Description:
162  *   This makes sure WB_SYNC_NONE background writeback happens. When
163  *   this function returns, it is only guaranteed that for given BDI
164  *   some IO is happening if we are over background dirty threshold.
165  *   Caller need not hold sb s_umount semaphore.
166  */
167 void bdi_start_background_writeback(struct backing_dev_info *bdi)
168 {
169     /*
170      * We just wake up the flusher thread. It will perform background
171      * writeback as soon as there is no other work to do.
172      */
173     trace_writeback_wake_background(bdi);
174     spin_lock_bh(&bdi->wb_lock);  ---〉 很明显,这里也需要获取这个锁。
175     bdi_wakeup_flusher(bdi);
176     spin_unlock_bh(&bdi->wb_lock);
177 }

那么。。 

try_to_wake_up 的线程不是bdi->wb->task ??

call trace 里的:
[18821.855547] Pid: 88944, comm: bash

表示的是什么意思? 当前正在运行的线程,try_to_wake_up另一线程的caller?  



论坛徽章:
0
6 [报告]
发表于 2013-02-08 13:37 |只看该作者
PID        PPID  CPU       TASK                     ST  %MEM     VSZ      RSS   COMM
88944   5548   1           ffff8805cfb4e200  RU   0.0       17584   4896  bash
3543     3541   1           ffff880651282180  IN   0.0   14424    740  audispd
5128     4056   6           ffff8806537665c0  IN   0.0   91292   3636  sshd
5131     5128   1           ffff880655aaa680  IN   0.0   13496   2508  bash
5548     5128   0           ffff88065681c0c0  IN   0.0   17584   6480   bash
101613   5128   1         ffff880650be4540  IN   0.0   13492   2500  bash

论坛徽章:
0
7 [报告]
发表于 2013-02-17 15:39 |只看该作者
本帖最后由 25641463 于 2013-02-17 15:40 编辑

额。。华为存储果然辛苦。。年前一天还在搞bug。作为同行,如果我遇到这个问题,可以先用objdump 看一下 try_to_wake_up+0x54这个地方,对照源码多少行。然后再 try_to_wake_up函数中对应的代码前加一句断言函数。接着复现一下问题,看一下堆栈信息是否存在有用信息。因为看你在5楼的描述,还没搞清楚唤醒的是不是bdi->wb->task 线程,这个确定了,才能进行下一步分析。。也不一定是p,没准是p的成员引用为null或者唤醒其他线程出的问题

论坛徽章:
0
8 [报告]
发表于 2013-02-21 20:20 |只看该作者
这个看过了的。



0xffffffff8104e554 <try_to_wake_up+68>:        je     0xffffffff8104e73f <try_to_wake_up+559> //out
0xffffffff8104e55a <try_to_wake_up+74>:        mov    0x8(%r12),%rax
0xffffffff8104e55f <try_to_wake_up+79>:        mov    0x2c(%r12),%ecx
/usr/src/linux-3.0.13-0.27/include/linux/sched.h: 2599
0xffffffff8104e564 <try_to_wake_up+84>:        mov    0x18(%rax),%eax    //failed here
0xffffffff8104e567 <try_to_wake_up+87>:        test   %ecx,%ecx
0xffffffff8104e569 <try_to_wake_up+89>:        mov    %eax,-0x2c(%rbp)
0xffffffff8104e56c <try_to_wake_up+92>:        je     0xffffffff8104e5d2 <try_to_wake_up+194> //stat
0xffffffff8104e56e <try_to_wake_up+94>:        mov    $0x11600,%r14


2597 static inline unsigned int task_cpu(const struct task_struct *p)
2598 {
2599     return task_thread_info(p)->cpu;
2600 }

2405 #define task_thread_info(task)  ((struct thread_info *)(task)->stack)

线程栈被破坏?
什么情况下线程栈会被破坏?

复现过两次。
两次的线程不一样

PID: 88944  TASK: ffff8805cfb4e200  CPU: 1   COMMAND: "bash"
#0 [ffff88067e623a30] machine_kexec at ffffffff810265ce
#1 [ffff88067e623a80] crash_kexec at ffffffff810a3b5a
#2 [ffff88067e623b50] oops_begin at ffffffff81442de8
#3 [ffff88067e623b70] __bad_area_nosemaphore at ffffffff81032995
#4 [ffff88067e623c30] do_page_fault at ffffffff8144549e
#5 [ffff88067e623d30] paranoid_exit at ffffffff81442065
#6 [ffff88067e623d40] wakeup_timer_fn at ffffffff8110f9b0
#7 [ffff88067e623db8] try_to_wake_up at ffffffff8104e564
#8 [ffff88067e623e10] wakeup_timer_fn at ffffffff8110f9b0
#9 [ffff88067e623e28] bdi_lock_two at ffffffff8110fa11
#10 [ffff88067e623e48] call_timer_fn at ffffffff810686cb
#11 [ffff88067e623e78] run_timer_softirq at ffffffff81069ad3
#12 [ffff88067e623ee8] __do_softirq at ffffffff81060a1f
#13 [ffff88067e623f48] arch_unwind_init_running at ffffffff8144a8bc
#14 [ffff88067e623f60] do_softirq at ffffffff810043b5
#15 [ffff88067e623f90] smp_apic_timer_interrupt at ffffffff81020d58
#16 [ffff88067e623fb0] invalidate_interrupt2 at ffffffff8144a073
--- <IRQ stack> ---
#17 [ffff8805f93bbf50] invalidate_interrupt2 at ffffffff8144a073
    RIP: ffffffffffffff10  RSP: 0000000000010216  RFLAGS: 00000033
    RAX: 00007fbb7a0d2000  RBX: 0000000000000003  RCX: 000000002ff998a2
    RDX: 00000000000003f3  RSI: 0000000000000000  RDI: 00007fbb7aa74000
    RBP: ffffffff8144a06e   R8: 00007fbb7a0d2c40   R9: 00007fbb7aaa91a8
    R10: 00000000fc2c9fc5  R11: 00007fbb7aa74aa8  R12: 0000000000000000
    R13: 0000000003f0b27f  R14: 0000000000000005  R15: ffffffff81442065
    ORIG_RAX: 00000000000003f3  CS: 7fbb7a8929cf  SS: 7fff83f54f30
WARNING: possibly bogus exception frame


另一次是

PID: 0      TASK: ffff8806588e0040  CPU: 4   COMMAND: "kworker/0:1"
#0 [ffff88067e683bf0] machine_kexec at ffffffff810265ce
#1 [ffff88067e683c40] crash_kexec at ffffffff810a3b5a
#2 [ffff88067e683d10] oops_begin at ffffffff81442de8
#3 [ffff88067e683d30] machine_check at ffffffff81442035
#4 [ffff88067e683d40] wakeup_timer_fn at ffffffff8110f9b0
#5 [ffff88067e683db8] try_to_wake_up at ffffffff8104e564
#6 [ffff88067e683e10] wakeup_timer_fn at ffffffff8110f9b0
#7 [ffff88067e683e28] bdi_lock_two at ffffffff8110fa11
#8 [ffff88067e683e48] call_timer_fn at ffffffff810686cb
#9 [ffff88067e683e78] run_timer_softirq at ffffffff81069ad3
#10 [ffff88067e683ee8] __do_softirq at ffffffff81060a1f
#11 [ffff88067e683f48] arch_unwind_init_running at ffffffff8144a8bc
#12 [ffff88067e683f60] do_softirq at ffffffff810043b5
#13 [ffff88067e683f90] smp_apic_timer_interrupt at ffffffff81020d58
#14 [ffff88067e683fb0] invalidate_interrupt2 at ffffffff8144a073
--- <IRQ stack> ---
#15 [ffff8806588e3df0] invalidate_interrupt2 at ffffffff8144a073
    RIP: ffffffffffffff10  RSP: 0000000000000202  RFLAGS: 00000010
    RAX: 0000000000000305  RBX: 0000000000000d67  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000d67  RDI: 0000000000000000
    RBP: ffffffff8144a06e   R8: 000000000000000b   R9: 0000000000000733
    R10: ffffffff81048560  R11: ffff8806588e3fd8  R12: 0000000000011600
    R13: ffff8806588e2010  R14: 0000000000000000  R15: ffffffff8143f682
    ORIG_RAX: 0000000000345c8b  CS: ffffffff812a8dee  SS: ffff8806588e3ea8
WARNING: possibly bogus exception frame



问题很难复现。




回复 7# 25641463


   

论坛徽章:
0
9 [报告]
发表于 2013-02-22 17:03 |只看该作者
根据两次的call trace 来看唤醒的应该是  bdi->wb->task

只是(task)->stack 已经是非法的地址





回复 7# 25641463


   

论坛徽章:
0
10 [报告]
发表于 2013-03-01 09:41 |只看该作者

http://git.kernel.org/?p=linux/k ... 2f6d734d38770324dc4

终于找到原因。。

dev 已经空了。。 

回复 9# tacoee


   
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP