linux arm 页表建立。

darling54454 发表于 2015-01-04 20:29

//addr为要映射的虚拟地址.
/arch/arm/mm/mmu.c
函数:create_mapping
pgd = pgd_offset_k(addr);

/arch/arm/include/asm/pgtable.h
/* to find an entry in a page-table-directory */
#define pgd_index(addr) ((addr) >> PGDIR_SHIFT)
#define pgd_offset(mm, addr) ((mm)->pgd + pgd_index(addr))

/* to find an entry in a kernel page-table-directory */
#define pgd_offset_k(addr) pgd_offset(&init_mm, addr)

/arch/arm/include/asm/pgtable-2level.h
#define PGDIR_SHIFT 21

为什么会是偏移21bit?

这样得到的一级描述符地址还会是正确的么？
最后得到的地址为: mm->pgd + ((addr)>>PGDIR_SHIFT) * sizeof(mm->pgd)
/*******************************************/

忽然发现，mm->pgd + pgd_index(addr)指向地址还跟pgd的数据类型相关。
然后去查看了一下它的数据类型。
/arch/arm/include/asm/pgtable-2level-type.h
typedef struct { pmdval_t pgd; } pgd_t;

/arch/arm/include/asm/pgtable-3level-type.h
typedef pgdval_t pgd_t;

pgdval_t 就是long字节长度

所以可以看出，当用2level时候虽然计算pgd时候使用的是PGDIR_SHIFT = 21但是其实还是等价于20.
因为2level的pgd_t使用的是 pmdval_t pgd，长度为两倍。

本来想请教大神为何会是偏移21bit。现在好像发现答案。
不知道我这样的理解是正确的不？

arm-linux-gcc 发表于 2015-01-05 21:57

你的理解是对的，取得的pgd是按照2M对齐的
如果addr本身是在奇数section，那么在__map_init_section会对pmd++，所以最后无需担心找错entry

设计成21bit的原因：
因为在做二级页表时，每次分配了一个page（见alloc_init_pte里面的early_pte_alloc），这一个page可以容纳4份二级页表
然而由于页表又要分hw和sw的，所以一个page就只能容纳2个页映射了
如果只考虑某一个页的映射，那么2K内存就够了（sw和hw），于是分配到的这个page就有剩余的2K没利用上
所以为了充分利用这些内存，就一次映射了两个相邻的一级页表项，他们各自又对应了一份二级页表（都在同一个page里），这样就将这4K内存全部利用上了
所以为了寻址方便，就设计成了21bit，这样不管addr是奇数还是偶数，就都能够找到相邻一级页表中的偶数序号的那个

super皮波 发表于 2015-01-06 10:44

回复 2# arm-linux-gcc
页表分hw和sw是什么意思，谢谢

arm-linux-gcc 发表于 2015-01-06 11:08

本帖最后由 arm-linux-gcc 于 2015-01-06 11:10 编辑

回复 3# super皮波

arm的页表，没有dirty位，这个位用来实现页交换时是必须的
所以linux模拟了dirty位，你可以看到有pgtable-2level.h和pgtable-2level-hwdef.h两个定义，前者是sw（也就是linux页表，模拟了dirty young等等），后者是hw（ARM页表）

super皮波 发表于 2015-01-06 11:32

回复 4# arm-linux-gcc
相当于两个pte指向一个地址，只不过一个用于sw另外一个是hw，是这样吗

arm-linux-gcc 发表于 2015-01-06 11:55

回复 5# super皮波

/*
* Hardware-wise, we have a two level page table structure, where the first
* level has 4096 entries, and the second level has 256 entries.Each entry
* is one 32-bit word.Most of the bits in the second level entry are used
* by hardware, and there aren't any "accessed" and "dirty" bits.
*
* Linux on the other hand has a three level page table structure, which can
* be wrapped to fit a two level page table structure easily - using the PGD
* and PTE only.However, Linux also expects one "PTE" table per page, and
* at least a "dirty" bit.
*
* Therefore, we tweak the implementation slightly - we tell Linux that we
* have 2048 entries in the first level, each of which is 8 bytes (iow, two
* hardware pointers to the second level.)The second level contains two
* hardware PTE tables arranged contiguously, preceded by Linux versions
* which contain the state information Linux needs.We, therefore, end up
* with 512 entries in the "PTE" level.
*
* This leads to the page tables having the following layout:
*
* pgd          pte
* |    |
* +--------+
* |    |    +------------+ +0
* +- - - - +    | Linux pt 0 |
* |    |    +------------+ +1024
* +--------+ +0 | Linux pt 1 |
* |    |-----> +------------+ +2048
* +- - - - + +4 |h/w pt 0|
* |    |-----> +------------+ +3072
* +--------+ +8 |h/w pt 1|
* |    |    +------------+ +4096
*
* See L_PTE_xxx below for definitions of bits in the "Linux pt", and
* PTE_xxx for definitions of bits appearing in the "h/w pt".
*
* PMD_xxx definitions refer to bits in the first level page table.
*
* The "dirty" bit is emulated by only granting hardware write permission
* iff the page is marked "writable" and "dirty" in the Linux PTE.This
* means that a write to a clean page will cause a permission fault, and
* the Linux MM layer will mark the page dirty via handle_pte_fault().
* For the hardware to notice the permission change, the TLB entry must
* be flushed, and ptep_set_access_flags() does that for us.
*
* The "accessed" or "young" bit is emulated by a similar method; we only
* allow accesses to the page if the "young" bit is set.Accesses to the
* page will cause a fault, and handle_pte_fault() will set the young bit
* for us as long as the page is marked present in the corresponding Linux
* PTE entry.Again, ptep_set_access_flags() will ensure that the TLB is
* up to date.
*
* However, when the "young" bit is cleared, we deny access to the page
* by clearing the hardware PTE.Currently Linux does not flush the TLB
* for us in this case, which means the TLB will retain the transation
* until either the TLB entry is evicted under pressure, or a context
* switch which changes the user space mapping occurs.
*/

这是pgtable-2levle.h中的注释

darling54454 发表于 2015-01-06 20:07

回复 2# arm-linux-gcc
恩恩，已经理解到了。
就是一个页里面放两个页表，共可以映射2M。
然后在一级描述符中4096个项，每个项映射1M。不过Linux伪装成2048个项，每个项映射2M。

另外，请问一下大神。
今天继续看代码。。怎么找也找不到 struct processor 这个结构体的定义。。

被包含在:
struct proc_info_list {
unsigned int cpu_val;
unsigned int cpu_mask;
unsigned long __cpu_mm_mmu_flags; /* used by head.S */
unsigned long __cpu_io_mmu_flags; /* used by head.S */
unsigned long __cpu_flush; /* used by head.S */
const char *arch_name;
const char *elf_name;
unsigned int elf_hwcap;
const char *cpu_name;
struct processor *proc;
struct cpu_tlb_fns *tlb;
struct cpu_user_fns *user;
struct cpu_cache_fns *cache;
};

在setup_processor函数中：
#ifdef MULTI_CPU
processor = *list->proc;
#endif

再者，设置PTE时候，
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)

#ifndef MULTI_CPU
....
extern void cpu_set_pte_ext(pte_t *ptep, pte_t pte, unsigned int ext);
...
#else
...
#define cpu_set_pte_ext processor.set_pte_ext
...

对于没有定义MULTI_CPU:
#define cpu_set_pte_ext __glue(CPU_NAME,_set_pte_ext)

其实processor.set_pte_ext或__glue(CPU_NAME,_set_pte_ext)调用的函数都是一个
比如是arm920架构则都为(proc-arm920.s)
cpu_arm920_set_pte_ext.

所以在想，宏MULTI_CPU 的意思是什么。。

arm-linux-gcc 发表于 2015-01-06 21:52

本帖最后由 arm-linux-gcc 于 2015-01-06 21:54 编辑

回复 7# darling54454

arch/arm/include/asm/proc-fns.h里面定义了struct processor的类型
processor这个变量本身的定义是在arch/arm/kernel/setup.c里面，struct processor processor __read_mostly;

MULTI_CPU表示同一个vmlinux同时支持多种arm架构，例如make menuconfig时同时打开了armv6的s3c6410和armv7的exynos4420
如果一个vmlinux支持多种arm架构，那么cpu_set_pte_ext的实现就是根据实际运行时所识别到的那种arm架构的对应实现，所以定义为processor.set_pte_ext
如果一个vmlinux只支持一种arm架构，那么cpu_set_pte_ext就可以直接定义为相应的函数名，这样比processor.set_pte_ext的方式就少了一次寻址，因为processor.set_pte_ext需要先寻址到processor，然后从中取出set_pte_ext字段才能找到想要的函数，所以GLUE的方法更快捷

arch/arm/include/asm/glue-proc.h中有
#ifdef CONFIG_CPU_ARM920T
# ifdef CPU_NAME
#undefMULTI_CPU
#define MULTI_CPU
# else
#define CPU_NAME cpu_arm920
# endif
#endif

#if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
# ifdef CPU_NAME
#undefMULTI_CPU
#define MULTI_CPU
# else
#define CPU_NAME cpu_v6
# endif
#endif

#ifdef CONFIG_CPU_V7
# ifdef CPU_NAME
#undefMULTI_CPU
#define MULTI_CPU
# else
#define CPU_NAME cpu_v7
# endif
例如编译vmlinux时在make menuconfig时选了多种arm架构，那么这里的CONFIG_CPU_V7CONFIG_CPU_V6（或CONFIG_CPU_V6K）CONFIG_CPU_ARM920T就都有定义，
那么最终的预处理结果就会是CPU_NAME被定义成cpu_arm920，于是MULTI_CPU就会在最后CONFIG_CPU_V7里面被定义，于是文件最后的#ifndef MULTI_CPU里面的就不会采用。于是cpu_set_pte_ext最终就会被定义成processor.set_pte_ext
如果只选了CONFIG_CPU_V7这一种arm架构，那么CPU_NAME就会被定义为cpu_v7，而MULTI_CPU就不会被定义，于是cpu_set_pte_ext最终就会被定义成__glue(CPU_NAME,_set_pte_ext)，就是cpu_v7_set_pte_ext

需要结合起来看的文件有
arch/arm/mm/proc-macro.S       define_processor_functions
arch/arm/mm/proc-v7.S             define_processor_functions v7, dabort=v7_early_abort, pabort=v7_pabort, suspend=1 和 .macro __v7_proc initfunc, mm_mmuflags = 0, io_mmuflags = 0, hwcaps = 0, proc_fns = v7_processor_functions
arch/arm/mm/proc-v7-2level.S    cpu_v7_set_pte_ext
arch/arm/include/asm/glue-proc.h
arch/arm/include/asm/proc-fns.h
arch/arm/include/asm/procinfo.h

MULTI还有一个CONFIG_MULTI_IRQ_HANDLER，他和MULTI_CPU无关，CONFIG_MULTI_IRQ_HANDLER表示是可以使用中断控制器驱动自己实现的回调，而不是统一调用arch_irq_handler_default
CONFIG_MULTI_IRQ_HANDLER通常都是打开的

darling54454 发表于 2015-01-07 18:38

回复 8# arm-linux-gcc
对于你的回答，肃然起敬。还需要多多学习，非常感谢。

我在想请教一个比较细节问题。。也是比较纳闷。。

int cpu = smp_processor_id();
用来读取当前CPU的ID号。

在start_kernel里面，boot_cpu_init函数就是调用smp_processor_id (应该也是第一次以这种方式，前面代码是直接读取CP15中的C0）它返回的是thread_info.cpu。
那么意味着thread_info.cpu之前被赋值的，但是却找不到在哪里赋值。。

start_kernel中调用smp_setup_processor_id()：

int __cpu_logical_map;

void __init smp_setup_processor_id(void)
{
int i; //mrc p15, 0, r0, c0, c0, 5
u32 cpu = is_smp() ? read_cpuid_mpidr() & 0xff : 0;

for (i = 1; i < NR_CPUS; ++i)
cpu_logical_map(i) = i == cpu ? 0 : i;

printk(KERN_INFO "Booting Linux on physical CPU %d\n", cpu);
}
#define cpu_logical_map(cpu) __cpu_logical_map

我只是想知道。。thread_info.cpu是哪里确定下来的。。。（感觉不明不白很苦恼。。。虽然对于整体来说好像没什么影响。。。各种百度就是大概就是返回CPU ID）。。

darling54454 发表于 2015-01-07 18:44

回复 8# arm-linux-gcc

想请教一下，大神都是如何来调试内核代码？
因为如果只是去分析，感觉好像不那么真实，并且抽象。可以写进开发板然后打印出来进行观察，但是好像不断的编译内核应该也是挺浪费时间，精力的吧。
最近看到QEMU，ARM模拟器，不知道是不是可以利用它来调试？

哎，大四学生一个，学习起来真困难。。（工作都没着落。。）

页: [1] 2

Chinaunix's Archiver

linux arm 页表建立。