- 论坛徽章:
- 0
|
我国庆期间写的一篇http://blog.chinaunix.net/u/355/
FreeBSD启动时页表建立过程分析
前几天看 FreeBSD内存管理部分的代码时候发现用户空间的顶部并不是原来想象中的内核空间的起始两者在i386上之间有4M的差异.为此,我在 http://www.freebsdchina.org/forum/viewtopic.php?t=38102 发贴开问.之后通过和NetBSD的代码比较,发现NetBSD里pmap.h中有相关的内容显示了内存空间的分配情况.
/*
* see pte.h for a description of i386 MMU terminology and hardware
* interface.
*
* a pmap describes a processes' 4GB virtual address space. this
* virtual address space can be broken up into 1024 4MB regions which
* are described by PDEs in the PDP. the PDEs are defined as follows:
*
* (ranges are inclusive -> exclusive, just like vm_map_entry start/end)
* (the following assumes that KERNBASE is 0xc0000000)
*
* PDE#s VA range usage
* 0->766 0x0 -> 0xbfc00000 user address space
* 767 0xbfc00000-> recursive mapping of PDP (used for
* 0xc0000000 linear mapping of PTPs)
* 768->1023 0xc0000000-> kernel address space (constant
* 0xffc00000 across all pmap's/processes)
* 1023 0xffc00000-> "alternate" recursive PDP mapping
* <end> (for other pmaps)
*
*
* note: a recursive PDP mapping provides a way to map all the PTEs for
* a 4GB address space into a linear chunk of virtual memory. in other
* words, the PTE for page 0 is the first int mapped into the 4MB recursive
* area. the PTE for page 1 is the second int. the very last int in the
* 4MB range is the PTE that maps VA 0xfffff000 (the last page in a 4GB
* address).
*
* all pmap's PD's must have the same values in slots 768->1023 so that
* the kernel is always mapped in every process. these values are loaded
* into the PD at pmap creation time.
*
* at any one time only one pmap can be active on a processor. this is
* the pmap whose PDP is pointed to by processor register %cr3. this pmap
* will have all its PTEs mapped into memory at the recursive mapping
* point (slot #767 as show above). when the pmap code wants to find the
* PTE for a virtual address, all it has to do is the following:
*
* address of PTE = (767 * 4MB) + (VA / PAGE_SIZE) * sizeof(pt_entry_t)
* = 0xbfc00000 + (VA / 4096) * 4
*
* what happens if the pmap layer is asked to perform an operation
* on a pmap that is not the one which is currently active? in that
* case we take the PA of the PDP of non-active pmap and put it in
* slot 1023 of the active pmap. this causes the non-active pmap's
* PTEs to get mapped in the final 4MB of the 4GB address space
* (e.g. starting at 0xffc00000).
*
* the following figure shows the effects of the recursive PDP mapping:
*
* PDP (%cr3)
* +----+
* | 0| -> PTP#0 that maps VA 0x0 -> 0x400000
* | |
* | |
* | 767| -> points back to PDP (%cr3) mapping VA 0xbfc00000 -> 0xc0000000
* | 768| -> first kernel PTP (maps 0xc0000000 -> 0xc0400000)
* | |
* |1023| -> points to alternate pmap's PDP (maps 0xffc00000 -> end)
* +----+
*
* note that the PDE#767 VA (0xbfc00000) is defined as "PTE_BASE"
* note that the PDE#1023 VA (0xffc00000) is defined as "APTE_BASE"
*
* starting at VA 0xbfc00000 the current active PDP (%cr3) acts as a
* PTP:
*
* PTP#767 == PDP(%cr3) => maps VA 0xbfc00000 -> 0xc0000000
* +----+
* | 0| -> maps the contents of PTP#0 at VA 0xbfc00000->0xbfc01000
* | |
* | |
* | 767| -> maps contents of PTP#767 (the PDP) at VA 0xbfeff000
* | 768| -> maps contents of first kernel PTP
* | |
* |1023|
* +----+
*
* note that mapping of the PDP at PTP#767's VA (0xbfeff000) is
* defined as "PDP_BASE".... within that mapping there are two
* defines:
* "PDP_PDE" (0xbfeffbfc) is the VA of the PDE in the PDP
* which points back to itself.
* "APDP_PDE" (0xbfeffffc) is the VA of the PDE in the PDP which
* establishes the recursive mapping of the alternate pmap.
* to set the alternate PDP, one just has to put the correct
* PA info in *APDP_PDE.
*
* note that in the APTE_BASE space, the APDP appears at VA
* "APDP_BASE" (0xfffff000).
*/
从上面的信息看来4G的线性地址空间里,用户空间是从0x00000000 -> 0xbfc00000 也就是3G-4M.这4M空间实际上是用来存放页表的.这部分代码一部分在locore.S里,还有一部分在pmap.c里.
下面来看看locore.S.它是内核最先被执行的代码.内核被bootloader加载后就跳转到locore.S开始执行.其中 create_pagetables开始的一段代码(应该说是一个汇编实现的函数)是我所感兴趣的部分.内核在检测完CPU类型后就开始建立页表.通过之后的源代码分析,此时的CPU应该是被bootloader设置为了保护模式但是还没有开启分页.建立完页表之后就可以开启分页机制了.为了简单起见,我略过了SMP,PAE相关的代码.
706 /**********************************************************************
707 *
708 * Create the first page directory and its page tables.
709 *
710 */
711
712 create_pagetables:
713
714 /* Find end of kernel image (rounded up to a page boundary). */
715 movl $R(_end),%esi
716
717 /* Include symbols, if any. */
718 movl R(bootinfo+BI_ESYMTAB),%edi
719 testl %edi,%edi
720 je over_symalloc
721 movl %edi,%esi
722 movl $KERNBASE,%edi
723 addl %edi,R(bootinfo+BI_SYMTAB)
724 addl %edi,R(bootinfo+BI_ESYMTAB)
725 over_symalloc:
726
727 /* If we are told where the end of the kernel space is, believe it. */
728 movl R(bootinfo+BI_KERNEND),%edi
729 testl %edi,%edi
730 je no_kernend
731 movl %edi,%esi
732 no_kernend:
733
734 addl $PDRMASK,%esi /* Play conservative for now, and */
735 andl $~PDRMASK,%esi /* ... wrap to next 4M. */
736 movl %esi,R(KERNend) /* save end of kernel */
737 movl %esi,R(physfree) /* next free page is at end of kernel */
首先确定内核加载之后,内核的尾端在哪里.然后以4M对齐,把尾地址写入KERNend和physfree.以4M对齐的目的是为了建立页目录的时候, 内核映像部分能够正好凑齐整数个页目录(页目录里每项可寻址4M空间即1024个页表项*4096字节/页).现在内核的尾地址被存放在了KERNend 里.而physfree里面存放的是空闲物理内存的起始地址,最初它和KERNend是一样的.
继续往下分析前,我们看两个宏定义
156 #define R(foo) ((foo)-KERNBASE)/* 用来把线性地址转化为物理地址,由于链接的时候是使用KERNBASE作为ELF文件的基址,所以代码里所有的全局变量的地址、函数地址都是以 KERNBASE为基址的,而bootloader是把内核加载到物理地址0开始的空间里,这个宏就是用来把线性地址转化为内核在内存中的实际的物理地址 */
157
158 #define ALLOCPAGES(foo) \
159 movl R(physfree), %esi ; /* esi = 空闲页面地址 */ \
160 movl $((foo)*PAGE_SIZE), %eax ; /* 计算出需要的字节数 */ \
161 addl %esi, %eax ; \
162 movl %eax, R(physfree) ; /* 更新空闲页面地址 */ \
163 movl %esi, %edi ; /* 分配到的空间首地址 */ \
164 movl $((foo)*PAGE_SIZE),%ecx ; \
165 xorl %eax,%eax ; \
166 cld ; \
167 rep ; /* 清空分配的内存 */ \
168 stosb
169
通过ALLOCPAGES的定义可以知道,ALLOCPAGES的参数是需要申请的内存页数,申请到的内存起始地址放在esi寄存器里.
OK,可以继续分析了。
739 /* Allocate Kernel Page Tables */
740 ALLOCPAGES(NKPT)
741 movl %esi,R(KPTphys)
742
.
.
.
.
749 ALLOCPAGES(NPGPTD)
750 movl %esi,R(IdlePTD)
751
752 /* Allocate KSTACK */
753 ALLOCPAGES(KSTACK_PAGES)
754 movl %esi,R(p0kpa)
755 addl $KERNBASE, %esi
756 movl %esi, R(proc0kstack)
757
758 ALLOCPAGES(1) /* vm86/bios stack */
759 movl %esi,R(vm86phystk)
760
761 ALLOCPAGES(3) /* pgtable + ext + IOPAGES */
762 movl %esi,R(vm86pa)
763 addl $KERNBASE, %esi
764 movl %esi, R(vm86paddr)
.
.
.
.
根据注释可知,分配了NKPT个页面给KPTphys用来之后存放内核页表,分配了NPGPTD个页面给IdlePTD用来存放页目录。其他剩下的几个分配的都跟我要知道的内容无关,就直接忽略。
继续前再看两个宏。
170 /*
171 * fillkpt
172 * eax = page frame address
173 * ebx = index into page table
174 * ecx = how many pages to map
175 * base = base address of page dir/table
176 * prot = protection bits
177 */
178 #define fillkpt(base, prot) \
179 shll $PTESHIFT,%ebx ; \
180 addl base,%ebx ; \
181 orl $PG_V,%eax ; \
182 orl prot,%eax ; \
183 1: movl %eax,(%ebx) ; \
184 addl $PAGE_SIZE,%eax ; /* increment physical address */ \
185 addl $PTESIZE,%ebx ; /* next pte */ \
186 loop 1b
187
188 /*
189 * fillkptphys(prot)
190 * eax = physical address
191 * ecx = how many pages to map
192 * prot = protection bits
193 */
194 #define fillkptphys(prot) \
195 movl %eax, %ebx ; \
196 shrl $PAGE_SHIFT, %ebx ; \
197 fillkpt(R(KPTphys), prot)
fillkpt是为从eax开始的ecx个页面建立对应的页表,页表存放在base开始的第ebx项PTE(base+ebx*4)
fillptphys是给从eax开始的ecx页面在KPTphys区域里建立页表,偏移也是由eax相对0计算来的.
OK继续往下走.到了建立一堆页表的时候了.
814 xorl %eax, %eax
815 movl R(KERNend),%ecx
816 shrl $PAGE_SHIFT,%ecx
817 fillkptphys($PG_RW)
819 /* Map page directory. */
.
.
.
.
825
826 movl R(IdlePTD), %eax
827 movl $NPGPTD, %ecx
828 fillkptphys($PG_RW)/* Map page directory. */
.
.
.
.
上面这两段代码就是给内核镜像和存放页目录的页(IdlePTD)在KPTphys里建立页表.
现在页表已经建立好了,只要把页表所在的几个页的首地址存到页目录里就可以开启分页机制爽了.
最后一段:
901 movl R(KPTphys), %eax
902 xorl %ebx, %ebx
903 movl $NKPT, %ecx
904 fillkpt(R(IdlePTD), $PG_RW)
.
.
.
.
920 /*
921 * For the non-PSE case, install PDEs for PTs covering the KVA.
922 * For the PSE case, do the same, but clobber the ones corresponding
923 * to the kernel (from btext to KERNend) with 4M (2M for PAE) ('PS')
924 * PDEs immediately after.
925 */
926 movl R(KPTphys), %eax
927 movl $KPTDI, %ebx
928 movl $NKPT, %ecx
929 fillkpt(R(IdlePTD), $PG_RW)
.
.
.省略对PSE开启时的处理
.
946 done_pde:
947 /* install a pde recursively mapping page directory as a page table */
948 movl R(IdlePTD), %eax
949 movl $PTDPTDI, %ebx
950 movl $NPGPTD,%ecx
951 fillkpt(R(IdlePTD), $PG_RW)
901~904是给KPTPhys这个存放页表的区域的页面建立页目录,注意它存放在了IdlePTD的最顶端,之后926~929又在IdlePTD +KPTDI*4处建立了一份相同的页目录项.948~951是在IdlePTD+PTDPTDI*4处给IdlePTD自己建立页目录.之所以给自己建个页目录项,是为了把所有存放页表的区域集合到连续的线性地址空间上来.因为本身页目录里面放的也是物理地址,也可以当成页表用.
建立页面之后的代码就是将IdlePTD赋值给CR3寄存器.这样一来我们的线性地址空间分布就可以得到了.由于线性地址空间的分布主要是由在页目录中的偏移决定的.我们看看页目录里现在有哪些东西.由于在i386平台上目前KPTDI=768,PTDPTDI=767,可以得出KPTphys里面页表指向的页面既可以直接使用物理地址(因为页目录里第0项就是指向KPTphys那一堆页表),也可以使用KPTDI<<22+物理地址(页目录第KPTDI项开始也是指向KPTPhys).也就是说物理地址0~(KERNend-KERNBASE)被映射到了KERNBASE~KERNend 上.但是存放在IdlePTD最前面的那些页目录项是临时工,在pmap_bootstrap里会被清除掉.而由于第PTDPTDI项是指向的页目录自己所以KPTDI<<22之前的4M就是放的描述4G空间的页表,总共1M项共4M空间.因此0xc0000000以前的4M空间就是放页表用的.
![]()
[ 本帖最后由 lllaaa 于 2007-10-14 19:36 编辑 ] |
评分
-
查看全部评分
|