- 论坛徽章:
- 0
|
linux内核分析之进程地址空间
本文主要介绍linux内核中进程地址空间的数据结构描述,包括mm_struct/vm_area_struct。进程线性地址区间的分配流程,并对相应的源代码做了注释。
内核中的函数以相当直接了当的方式获得动态内存。当给用户态进程分配内存时,情况完全不同了。进程对动态内存的请求被认为是不紧迫的,一般来说,内核总是尽量推迟给用户态进程分配内存。由于用户进程时不可信任的,因此,内核必须能随时准备捕获用户态进程引起的所有寻址错误。当用户态进程请求动态内存时,并没有获得请求的页框,而仅仅获得对一个新的线性地址区间的使用权,而这一线性地址区间就成为进程地址空间的一部分。
进程地址空间由允许进程使用的全部线性地址组成。内核可以通过增加或删除某些线程地址区间来动态地修改进程的地址空间。内核通过所谓线性去得资源来标示线性地址区间,线性区是由起始线性地址、长度和一些访问权限来描述的。进程获得新线性区的一些典型情况:
1.但用户在控制台输入一条命令时,shell进程创建一个新的进程去执行这个命令。结果是,一个全新的地址空间(也就是一组线性区)分配给新进程。
2.正在运行的进程有可能决定装入一个完全不同的程序。这时,进程描述符不变,可是在装入这个程序以前所有的线性区却被释放,并有一组新的线性区被分配给这个进程。
3.正在运行的进程可能对一个文件执行内存映像。
4.进程可能持续向他的用户态堆栈增加数据,知道映像这个堆栈的线性区用完为止,此时,内核也许会决定扩展这个线性区的大小。
5.进程可能创建一个IPC共享线性区来与其他合作进程共享数据。此时,内核给这个进程分配一个新的线性区以实现这个方案。
6.进程可能通过调用类似malloc这样的函数扩展自己的动态堆。结果是,内核可能决定扩展给这个堆所分配的线性区。
数据结构描述
进程描述符task_struct中的mm字段描述了进程地址空间
view plaincopy to clipboard- 01.struct mm_struct {
- 02. struct vm_area_struct * mmap; /* list of VMAs */
- 03. struct rb_root mm_rb;
- 04. struct vm_area_struct * mmap_cache; /* last find_vma result */
- 05. unsigned long (*get_unmapped_area) (struct file *filp,
- 06. unsigned long addr, unsigned long len,
- 07. unsigned long pgoff, unsigned long flags);
- 08. void (*unmap_area) (struct mm_struct *mm, unsigned long addr);
- 09. unsigned long mmap_base; /* base of mmap area */
- 10. unsigned long task_size; /* size of task vm space */
- 11. unsigned long cached_hole_size; /* if non-zero, the largest hole below free_area_cache */
- 12. unsigned long free_area_cache; /* first hole of size cached_hole_size or larger */
- 13. pgd_t * pgd;
- 14. atomic_t mm_users; /* How many users with user space? */
- 15. atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */
- 16. int map_count; /* number of VMAs */
- 17. struct rw_semaphore mmap_sem;
- 18. spinlock_t page_table_lock; /* Protects page tables and some counters */
- 19.
- 20. struct list_head mmlist; /* List of maybe swapped mm's. These are globally strung
- 21. * together off init_mm.mmlist, and are protected
- 22. * by mmlist_lock
- 23. */
- 24.
- 25. /* Special counters, in some configurations protected by the
- 26. * page_table_lock, in other configurations by being atomic.
- 27. */
- 28. mm_counter_t _file_rss;
- 29. mm_counter_t _anon_rss;
- 30.
- 31. unsigned long hiwater_rss; /* High-watermark of RSS usage */
- 32. unsigned long hiwater_vm; /* High-water virtual memory usage */
- 33.
- 34. unsigned long total_vm, locked_vm, shared_vm, exec_vm;
- 35. unsigned long stack_vm, reserved_vm, def_flags, nr_ptes;
- 36. unsigned long start_code, end_code, start_data, end_data;
- 37. unsigned long start_brk, brk, start_stack;
- 38. unsigned long arg_start, arg_end, env_start, env_end;
- 39.
- 40. unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
- 41.
- 42. struct linux_binfmt *binfmt;
- 43.
- 44. cpumask_t cpu_vm_mask;/*用于懒惰TLB交换的位掩码*/
- 45.
- 46. /* Architecture-specific MM context */
- 47. mm_context_t context;
- 48.
- 49. /* Swap token stuff */
- 50. /*
- 51. * Last value of global fault stamp as seen by this process.
- 52. * In other words, this value gives an indication of how long
- 53. * it has been since this task got the token.
- 54. * Look at mm/thrash.c
- 55. */
- 56. unsigned int faultstamp;
- 57. unsigned int token_priority;
- 58. unsigned int last_interval;
- 59.
- 60. unsigned long flags; /* Must use atomic bitops to access the bits */
- 61.
- 62. struct core_state *core_state; /* coredumping support */
- 63.#ifdef CONFIG_AIO
- 64. spinlock_t ioctx_lock;
- 65. struct hlist_head ioctx_list;/*一步IO上下文链表*/
- 66.#endif
- 67.#ifdef CONFIG_MM_OWNER
- 68. /*
- 69. * "owner" points to a task that is regarded as the canonical
- 70. * user/owner of this mm. All of the following must be true in
- 71. * order for it to be changed:
- 72. *
- 73. * current == mm->owner
- 74. * current->mm != mm
- 75. * new_owner->mm == mm
- 76. * new_owner->alloc_lock is held
- 77. */
- 78. struct task_struct *owner;
- 79.#endif
- 80.
- 81.#ifdef CONFIG_PROC_FS
- 82. /* store ref to file /proc/<pid>/exe symlink points to */
- 83. struct file *exe_file;
- 84. unsigned long num_exe_file_vmas;
- 85.#endif
- 86.#ifdef CONFIG_MMU_NOTIFIER
- 87. struct mmu_notifier_mm *mmu_notifier_mm;
- 88.#endif
- 89.};
复制代码 关于mm_users字段和mm_count字段
mm_users字段存放共享mm_struct数据结构的轻量级进程的个数。mm_count字段是内存描述符的主使计数器,在mm_users次使用计数器中的所有用户在mm_count中只作为一个单位,每当mm_count递减时,内核都要检查他是否变为0,如果是,就要解除这个内存描述符,因为不再有用户使用他。
用一个例子解释mm_users和mm_count之间的不同。考虑一个内存描述符由两个轻量级进程共享。他的mm_users字段通常存放的值为2,而mm_count字段存放的值为1(两个所有者进程算作一个)。如果把内存描述符在一个长操作的中间不被释放,那么,就应该增加mm_users字段而不是mm_count字段的值。最终结果是相同的,因为mm_users的增加确保了mm_count不变为0,即使拥有这个内存描述符的所有轻量级进程全部死亡。
内核线程仅运行在内核态,因此,他们永远不会访问低于TASK_SIZE(等于PAGE_OFFSET,通常为0xc0000000)的地址。与普通进程相反,内核线程不用线性区,因此,内存描述符的很多字段对内核线程是没有意义的。也就是说,当创建内核线程时,内核线程的active_mm共享父进程的mm,但是只使用mm中部分数据与变量。
线性区
linux通过类型为vm_area_struct的对象实现线性区,它的字段为
view plaincopy to clipboard- 01./*
- 02. * This struct defines a memory VMM memory area. There is one of these
- 03. * per VM-area/task. A VM area is any part of the process virtual memory
- 04. * space that has a special rule for the page-fault handlers (ie a shared
- 05. * library, the executable area etc).
- 06. */
- 07.struct vm_area_struct {
- 08. struct mm_struct * vm_mm; /* The address space we belong to. */
- 09. unsigned long vm_start; /* Our start address within vm_mm. */
- 10. unsigned long vm_end; /* The first byte after our end address
- 11. within vm_mm. */
- 12.
- 13. /* linked list of VM areas per task, sorted by address */
- 14. struct vm_area_struct *vm_next;
- 15.
- 16. pgprot_t vm_page_prot; /* Access permissions of this VMA. */
- 17. unsigned long vm_flags; /* Flags, see mm.h. */
- 18.
- 19. struct rb_node vm_rb;
- 20.
- 21. /*
- 22. * For areas with an address space and backing store,
- 23. * linkage into the address_space->i_mmap prio tree, or
- 24. * linkage to the list of like vmas hanging off its node, or
- 25. * linkage of vma in the address_space->i_mmap_nonlinear list.
- 26. */
- 27. union {
- 28. struct {
- 29. struct list_head list;
- 30. void *parent; /* aligns with prio_tree_node parent */
- 31. struct vm_area_struct *head;
- 32. } vm_set;
- 33.
- 34. struct raw_prio_tree_node prio_tree_node;
- 35. } shared;
- 36.
- 37. /*
- 38. * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
- 39. * list, after a COW of one of the file pages. A MAP_SHARED vma
- 40. * can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
- 41. * or brk vma (with NULL file) can only be in an anon_vma list.
- 42. */
- 43. struct list_head anon_vma_node; /* Serialized by anon_vma->lock */
- 44. struct anon_vma *anon_vma; /* Serialized by page_table_lock */
- 45.
- 46. /* Function pointers to deal with this struct. */
- 47. const struct vm_operations_struct *vm_ops;
- 48.
- 49. /* Information about our backing store: */
- 50. unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE
- 51. units, *not* PAGE_CACHE_SIZE */
- 52. struct file * vm_file; /* File we map to (can be NULL). */
- 53. void * vm_private_data; /* was vm_pte (shared mem) */
- 54. unsigned long vm_truncate_count;/* truncate_count or restart_addr */
- 55.
- 56.#ifndef CONFIG_MMU
- 57. struct vm_region *vm_region; /* NOMMU mapping region */
- 58.#endif
- 59.#ifdef CONFIG_NUMA
- 60. struct mempolicy *vm_policy; /* NUMA policy for the VMA */
- 61.#endif
- 62.};
复制代码 进程所拥有的线性区从来不重叠,并且内核尽力把新分配的线性区与邻接的现有线性区进行合并。如果两个相邻区的访问权限相匹配,就能把他们合并在一起。
操作
线性区的处理
我们举一个常用的find_vma函数,是一个从rb树中查找指定的线性区间。其他的函数不再举例。
view plaincopy to clipboard- 01./* Look up the first VMA which satisfies addr < vm_end, NULL if none. */
- 02.//deal with searching the virtual address space for mapped and free regions.
- 03.//The two parameters are the top-level mm_struct that is to be searched and the address the caller is interested in
- 04.struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
- 05.{
- 06.//Defaults to returning NULL for address not found.
- 07. struct vm_area_struct *vma = NULL;
- 08.//Makes sure the caller does not try to search a bogus mm.
- 09. if (mm) {
- 10. /* Check the cache first. */
- 11. /* (Cache hit rate is typically around 35%.) */
- 12. //mmap_cache has the result of the last call to find_vma().
- 13. //This has a chance of not having to search at all through the red-black tree
- 14. vma = mm->mmap_cache;
- 15. //If it is a valid VMA that is being examined, this checks to see if the address being searched is contained within it. If it is,
- 16. //the VMA was the mmap_cache one, so it can be returned. Otherwise, the tree is searched.
- 17. if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) {
- 18.//Starts at the root of the tree.
- 19. struct rb_node * rb_node;
- 20.
- 21. rb_node = mm->mm_rb.rb_node;
- 22. vma = NULL;
- 23.//This block is the tree walk.
- 24. while (rb_node) {
- 25. struct vm_area_struct * vma_tmp;
- 26.//The macro, as the name suggests, returns the VMA that this tree node points to.
- 27. vma_tmp = rb_entry(rb_node,
- 28. struct vm_area_struct, vm_rb);
- 29.//Checks if the next node is traversed by the left or right leaf
- 30. if (vma_tmp->vm_end > addr) {
- 31. vma = vma_tmp;
- 32.//If the current VMA is what is required, this exits the while loop
- 33. if (vma_tmp->vm_start <= addr)
- 34. break;
- 35. rb_node = rb_node->rb_left;
- 36. } else
- 37. rb_node = rb_node->rb_right;
- 38. }
- 39. //If the VMA is valid, this sets the mmap_cache for the next call to find_vma().
- 40. if (vma)
- 41. mm->mmap_cache = vma;
- 42. }
- 43. }
- 44.//Returns the VMA that contains the address or, as a side effect of the tree walk,
- 45.//returns the VMA that is closest to the requested address.
- 46. return vma;
- 47.}
复制代码 分配线性地址区间
do_mmap函数为当前进程创建并初始化一个新的线性区。不过,分配成功之后,可以把这个新的线性区与进程已有的其他线性区进行合并。
view plaincopy to clipboard- 01./*创建并初始化一个新的线性地址区间,
- 02.不过,分配成功之后,可以把这个新的先行区间
- 03.与已有的其他线性区进行合并;
- 04.file和offset:如果新的线性区将把一个文件映射到内存
- 05.则使用文件描述符指针file和文件偏移量offset
- 06.addr:这个线性地址指定从何处开始查找一个
- 07.空闲的区间;
- 08.len:线性区间的长度;
- 09.prot:指定这个线性区所包含页的访问权限,
- 10.比如读写、执行;
- 11.flag:指定线性区间的其他标志
- 12.*/
- 13.static inline unsigned long do_mmap(struct file *file, unsigned long addr,
- 14. unsigned long len, unsigned long prot,
- 15. unsigned long flag, unsigned long offset)
- 16.{
- 17. unsigned long ret = -EINVAL;
- 18. /*对offset的值进行一些初步的检查*/
- 19. if ((offset + PAGE_ALIGN(len)) < offset)
- 20. goto out;
- 21. if (!(offset & ~PAGE_MASK))
- 22. ret = do_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
- 23.out:
- 24. return ret;
- 25.}
复制代码 我们看do_mmap_pgoff函数做的实际工作
|
|