- 论坛徽章:
- 0
|
本帖最后由 tqyou85 于 2013-08-27 09:28 编辑
近期调试中遇到一个问题,板子跑业务的时候会出现cpu挂死的现象,ppc平台,32位系统,表现为
sshd占用cpu到100%,所有的ssh均无法登陆到板子。
查看内核发现其进入死循环:- 7:mon> t
- [ef37b9d0] c00644f4 lock_timer_base+0x34/0x70 (unreliable)
- [ef37b9f0] c006454c try_to_del_timer_sync+0x1c/0x150
- [ef37ba20] c0065c00 del_timer_sync+0x30/0x50
- [ef37ba30] c00727dc flush_delayed_work+0x1c/0x90
- [ef37ba40] c024b1c4 tty_flush_to_ldisc+0x14/0x30
- [ef37ba50] c0244f34 n_tty_poll+0x74/0x1a0
- [ef37ba70] c02402f0 tty_poll+0x90/0xc0
- [ef37ba90] c010d1e8 do_select+0x2e8/0x5e0
- [ef37bda0] c010d68c core_sys_select+0x1ac/0x3f0
- [ef37bf00] c010dc88 sys_select+0x38/0x150
- [ef37bf40] c00124b0 ret_from_syscall+0x0/0x4
- --- Exception: c01 (System Call) at 0fc8c4c0
- SP (bfcb3010) is in userspace
- 7:mon>
复制代码 进一步调试发现,该timer一直在lock_timer_base中死循环:- static struct tvec_base *lock_timer_base(struct timer_list *timer,
- unsigned long *flags)
- __acquires(timer->base->lock)
- {
- struct tvec_base *base;
- for (;;) {
- struct tvec_base *prelock_base = timer->base;
- base = tbase_get_base(prelock_base);
- if (likely(base != NULL)) {
- spin_lock_irqsave(&base->lock, *flags);
- if (likely(prelock_base == timer->base))
- return base;
- /* The timer has migrated to another CPU */
- spin_unlock_irqrestore(&base->lock, *flags);
- }
- cpu_relax();
- }
- }
复制代码 此时base = NULL,导致内核一直在该循环中无法退出。、
通过xmon查到出问题的timer function是:- 7:mon> la c0071b20
- c0071b20: delayed_work_timer_fn+0x0/0x50
复制代码 通过struct timer_list结构对照该timer发现,该timer的list_head是00000000 00200200,所以该timer应该是从内核的timer链表中删除了,但是为什么现在还会进入到del_timer_sync的流程呢?
|
|