- 论坛徽章:
- 0
|
又复现了,异常时的信息:- [48373.310000] partial.next:df913000,free.next:df802df0
- [48373.310000] slabs_partial pointer:df802de0,slabs_free pointer:df802df0
- [48373.310000] We found error condition: cache 'task_struct'(5), slabp df913000(inuse:5,free:-257).
- [48373.310000] get from partial
- [48373.310000] slab: Internal list corruption detected in cache 'task_struct'(5), slabp df913000(5), entries 1, kmem_map vaddr:0xdf91301c
- [48373.310000] Hexdump:
- [48373.310000]
- [48373.310000] 000: 00 50 90 df e0 2d 80 df 40 00 00 00 40 30 91 df
- [48373.310000] 010: 05 00 00 00 ff fe ff ff 00 00 ad de 03 00 00 00
- [48373.310000] 020: ff ff ff ff 00 00 00 00 ff fe ff ff ff ff ff ff
复制代码 上面的信息,说明是从task_struct_cache的 partial链表里取出了一个slab,slab的free字段是-257(0xfffffeff),导致后续的一系列问题
Hexdump的 前面0x1c字节是slab描述符,后面的是slab的obj位图数据 =- struct slab {
- union {
- struct {
- struct list_head list; 0xdf905000,0xdf802de0
- unsigned long colouroff; 0x40
- void *s_mem; /* including colour offset */ 0xdf913040
- unsigned int inuse; /* num of objs active in slab */ 0x5
- kmem_bufctl_t free; 0xfffffeff:typedef unsigned int kmem_bufctl_t;
- unsigned short nodeid;
- };
- struct slab_rcu __slab_cover_slab_rcu;
- };
- };
复制代码- 问题就是,这个 0xfffffeff 是怎么出来的?slab描述符周围的值看上去都是正常的。
- 怀疑是内存bit跳变
复制代码 ,本来该slab已经满了,但由于slabp的free从fffffff变成fffffeff,因此不加入full list,而加入了partial_list,
下一次再分配slab的时候,就从
partial里取了一个非法的slab,导致分配时cache_alloc_refill里的BUG_ON被触发- /* move slabp to correct slabp list: */
- list_del(&slabp->list);
- if (slabp->free == BUFCTL_END)
- list_add(&slabp->list, &l3->slabs_full);
- else
- list_add(&slabp->list, &l3->slabs_partial);
复制代码 准备让硬件做一下ecc校验。
|
|