linux arm 页表建立。
//addr为要映射的虚拟地址./arch/arm/mm/mmu.c
函数:create_mapping
pgd = pgd_offset_k(addr);
/arch/arm/include/asm/pgtable.h
/* to find an entry in a page-table-directory */
#define pgd_index(addr) ((addr) >> PGDIR_SHIFT)
#define pgd_offset(mm, addr) ((mm)->pgd + pgd_index(addr))
/* to find an entry in a kernel page-table-directory */
#define pgd_offset_k(addr) pgd_offset(&init_mm, addr)
/arch/arm/include/asm/pgtable-2level.h
#define PGDIR_SHIFT 21
为什么会是偏移21bit?
这样得到的一级描述符地址还会是正确的么?
最后得到的地址为: mm->pgd + ((addr)>>PGDIR_SHIFT) * sizeof(mm->pgd)
/*******************************************/
忽然发现,mm->pgd + pgd_index(addr)指向地址还跟pgd的数据类型相关。
然后去查看了一下它的数据类型。
/arch/arm/include/asm/pgtable-2level-type.h
typedef struct { pmdval_t pgd; } pgd_t;
/arch/arm/include/asm/pgtable-3level-type.h
typedef pgdval_t pgd_t;
pgdval_t 就是long字节长度
所以可以看出,当用2level时候虽然计算pgd时候使用的是PGDIR_SHIFT = 21但是其实还是等价于20.
因为2level的pgd_t使用的是 pmdval_t pgd,长度为两倍。
本来想请教大神为何会是偏移21bit。现在好像发现答案。
不知道我这样的理解是正确的不? 你的理解是对的,取得的pgd是按照2M对齐的
如果addr本身是在奇数section,那么在__map_init_section会对pmd++,所以最后无需担心找错entry
设计成21bit的原因:
因为在做二级页表时,每次分配了一个page(见alloc_init_pte里面的early_pte_alloc),这一个page可以容纳4份二级页表
然而由于页表又要分hw和sw的,所以一个page就只能容纳2个页映射了
如果只考虑某一个页的映射,那么2K内存就够了(sw和hw),于是分配到的这个page就有剩余的2K没利用上
所以为了充分利用这些内存,就一次映射了两个相邻的一级页表项,他们各自又对应了一份二级页表(都在同一个page里),这样就将这4K内存全部利用上了
所以为了寻址方便,就设计成了21bit,这样不管addr是奇数还是偶数,就都能够找到相邻一级页表中的偶数序号的那个
回复 2# arm-linux-gcc
页表分hw和sw是什么意思,谢谢
本帖最后由 arm-linux-gcc 于 2015-01-06 11:10 编辑
回复 3# super皮波
arm的页表,没有dirty位,这个位用来实现页交换时是必须的
所以linux模拟了dirty位,你可以看到有pgtable-2level.h和pgtable-2level-hwdef.h两个定义,前者是sw(也就是linux页表,模拟了dirty young等等),后者是hw(ARM页表)
回复 4# arm-linux-gcc
相当于两个pte指向一个地址,只不过一个用于sw另外一个是hw,是这样吗
回复 5# super皮波
/*
* Hardware-wise, we have a two level page table structure, where the first
* level has 4096 entries, and the second level has 256 entries.Each entry
* is one 32-bit word.Most of the bits in the second level entry are used
* by hardware, and there aren't any "accessed" and "dirty" bits.
*
* Linux on the other hand has a three level page table structure, which can
* be wrapped to fit a two level page table structure easily - using the PGD
* and PTE only.However, Linux also expects one "PTE" table per page, and
* at least a "dirty" bit.
*
* Therefore, we tweak the implementation slightly - we tell Linux that we
* have 2048 entries in the first level, each of which is 8 bytes (iow, two
* hardware pointers to the second level.)The second level contains two
* hardware PTE tables arranged contiguously, preceded by Linux versions
* which contain the state information Linux needs.We, therefore, end up
* with 512 entries in the "PTE" level.
*
* This leads to the page tables having the following layout:
*
* pgd pte
* | |
* +--------+
* | | +------------+ +0
* +- - - - + | Linux pt 0 |
* | | +------------+ +1024
* +--------+ +0 | Linux pt 1 |
* | |-----> +------------+ +2048
* +- - - - + +4 |h/w pt 0|
* | |-----> +------------+ +3072
* +--------+ +8 |h/w pt 1|
* | | +------------+ +4096
*
* See L_PTE_xxx below for definitions of bits in the "Linux pt", and
* PTE_xxx for definitions of bits appearing in the "h/w pt".
*
* PMD_xxx definitions refer to bits in the first level page table.
*
* The "dirty" bit is emulated by only granting hardware write permission
* iff the page is marked "writable" and "dirty" in the Linux PTE.This
* means that a write to a clean page will cause a permission fault, and
* the Linux MM layer will mark the page dirty via handle_pte_fault().
* For the hardware to notice the permission change, the TLB entry must
* be flushed, and ptep_set_access_flags() does that for us.
*
* The "accessed" or "young" bit is emulated by a similar method; we only
* allow accesses to the page if the "young" bit is set.Accesses to the
* page will cause a fault, and handle_pte_fault() will set the young bit
* for us as long as the page is marked present in the corresponding Linux
* PTE entry.Again, ptep_set_access_flags() will ensure that the TLB is
* up to date.
*
* However, when the "young" bit is cleared, we deny access to the page
* by clearing the hardware PTE.Currently Linux does not flush the TLB
* for us in this case, which means the TLB will retain the transation
* until either the TLB entry is evicted under pressure, or a context
* switch which changes the user space mapping occurs.
*/
这是pgtable-2levle.h中的注释 回复 2# arm-linux-gcc
恩恩,已经理解到了。
就是一个页里面放两个页表,共可以映射2M。
然后在一级描述符中4096个项,每个项映射1M。不过Linux伪装成2048个项,每个项映射2M。
另外,请问一下大神。
今天继续看代码。。怎么找也找不到 struct processor 这个结构体的定义。。
被包含在:
struct proc_info_list {
unsigned int cpu_val;
unsigned int cpu_mask;
unsigned long __cpu_mm_mmu_flags; /* used by head.S */
unsigned long __cpu_io_mmu_flags; /* used by head.S */
unsigned long __cpu_flush; /* used by head.S */
const char *arch_name;
const char *elf_name;
unsigned int elf_hwcap;
const char *cpu_name;
struct processor *proc;
struct cpu_tlb_fns *tlb;
struct cpu_user_fns *user;
struct cpu_cache_fns *cache;
};
在setup_processor函数中:
#ifdef MULTI_CPU
processor = *list->proc;
#endif
再者,设置PTE时候,
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
#ifndef MULTI_CPU
....
extern void cpu_set_pte_ext(pte_t *ptep, pte_t pte, unsigned int ext);
...
#else
...
#define cpu_set_pte_ext processor.set_pte_ext
...
对于没有定义MULTI_CPU:
#define cpu_set_pte_ext __glue(CPU_NAME,_set_pte_ext)
其实processor.set_pte_ext或__glue(CPU_NAME,_set_pte_ext)调用的函数都是一个
比如是arm920架构则都为(proc-arm920.s)
cpu_arm920_set_pte_ext.
所以在想,宏MULTI_CPU 的意思是什么。。
本帖最后由 arm-linux-gcc 于 2015-01-06 21:54 编辑
回复 7# darling54454
arch/arm/include/asm/proc-fns.h里面定义了struct processor的类型
processor这个变量本身的定义是在arch/arm/kernel/setup.c里面,struct processor processor __read_mostly;
MULTI_CPU表示同一个vmlinux同时支持多种arm架构,例如make menuconfig时同时打开了armv6的s3c6410和armv7的exynos4420
如果一个vmlinux支持多种arm架构,那么cpu_set_pte_ext的实现就是根据实际运行时所识别到的那种arm架构的对应实现,所以定义为processor.set_pte_ext
如果一个vmlinux只支持一种arm架构,那么cpu_set_pte_ext就可以直接定义为相应的函数名,这样比processor.set_pte_ext的方式就少了一次寻址,因为processor.set_pte_ext需要先寻址到processor,然后从中取出set_pte_ext字段才能找到想要的函数,所以GLUE的方法更快捷
arch/arm/include/asm/glue-proc.h中有
#ifdef CONFIG_CPU_ARM920T
# ifdef CPU_NAME
#undefMULTI_CPU
#define MULTI_CPU
# else
#define CPU_NAME cpu_arm920
# endif
#endif
#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
# ifdef CPU_NAME
#undefMULTI_CPU
#define MULTI_CPU
# else
#define CPU_NAME cpu_v6
# endif
#endif
#ifdef CONFIG_CPU_V7
# ifdef CPU_NAME
#undefMULTI_CPU
#define MULTI_CPU
# else
#define CPU_NAME cpu_v7
# endif
例如编译vmlinux时在make menuconfig时选了多种arm架构,那么这里的CONFIG_CPU_V7CONFIG_CPU_V6(或CONFIG_CPU_V6K)CONFIG_CPU_ARM920T就都有定义,
那么最终的预处理结果就会是CPU_NAME被定义成cpu_arm920,于是MULTI_CPU就会在最后CONFIG_CPU_V7里面被定义,于是文件最后的#ifndef MULTI_CPU里面的就不会采用。于是cpu_set_pte_ext最终就会被定义成processor.set_pte_ext
如果只选了CONFIG_CPU_V7这一种arm架构,那么CPU_NAME就会被定义为cpu_v7,而MULTI_CPU就不会被定义,于是cpu_set_pte_ext最终就会被定义成__glue(CPU_NAME,_set_pte_ext),就是cpu_v7_set_pte_ext
需要结合起来看的文件有
arch/arm/mm/proc-macro.S define_processor_functions
arch/arm/mm/proc-v7.S define_processor_functions v7, dabort=v7_early_abort, pabort=v7_pabort, suspend=1 和 .macro __v7_proc initfunc, mm_mmuflags = 0, io_mmuflags = 0, hwcaps = 0, proc_fns = v7_processor_functions
arch/arm/mm/proc-v7-2level.S cpu_v7_set_pte_ext
arch/arm/include/asm/glue-proc.h
arch/arm/include/asm/proc-fns.h
arch/arm/include/asm/procinfo.h
MULTI还有一个CONFIG_MULTI_IRQ_HANDLER,他和MULTI_CPU无关,CONFIG_MULTI_IRQ_HANDLER表示是可以使用中断控制器驱动自己实现的回调,而不是统一调用arch_irq_handler_default
CONFIG_MULTI_IRQ_HANDLER通常都是打开的
回复 8# arm-linux-gcc
对于你的回答,肃然起敬。还需要多多学习,非常感谢。
我在想请教一个比较细节问题。。也是比较纳闷。。
int cpu = smp_processor_id();
用来读取当前CPU的ID号。
在start_kernel里面,boot_cpu_init函数就是调用smp_processor_id (应该也是第一次以这种方式,前面代码是直接读取CP15中的C0)它返回的是thread_info.cpu。
那么意味着thread_info.cpu之前被赋值的,但是却找不到在哪里赋值。。
start_kernel中调用smp_setup_processor_id():
int __cpu_logical_map;
void __init smp_setup_processor_id(void)
{
int i; //mrc p15, 0, r0, c0, c0, 5
u32 cpu = is_smp() ? read_cpuid_mpidr() & 0xff : 0;
for (i = 1; i < NR_CPUS; ++i)
cpu_logical_map(i) = i == cpu ? 0 : i;
printk(KERN_INFO "Booting Linux on physical CPU %d\n", cpu);
}
#define cpu_logical_map(cpu) __cpu_logical_map
我只是想知道。。thread_info.cpu是哪里确定下来的。。。(感觉不明不白很苦恼。。。虽然对于整体来说好像没什么影响。。。各种百度就是大概就是返回CPU ID)。。 回复 8# arm-linux-gcc
想请教一下,大神都是如何来调试内核代码?
因为如果只是去分析,感觉好像不那么真实,并且抽象。可以写进开发板然后打印出来进行观察,但是好像不断的编译内核应该也是挺浪费时间,精力的吧。
最近看到QEMU,ARM模拟器,不知道是不是可以利用它来调试?
哎,大四学生一个,学习起来真困难。。(工作都没着落。。)
页:
[1]
2