- 论坛徽章:
- 1
|
回复 3# humjb_1983
我看了一下代码,觉得理由如下,不知道跟你们之前讨论是否一致:
================================
执行mmap系统调用导致如下的函数调用:
sys_mmap // 位于文件sys_x86_64.c (arch\x86\kernel)
sys_ mmap_pgoff // 位于文件util.c (mm)
do_mmap_pgoff // 位于文件mmap.c (mm)
mmap_region
首先,在函数do_mmap_pgoff()中设置vm_flags标志。
如果文件是以可写方式打开的,那么vm_flags标志至少包含以下的标志位:
VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC | VM_SHARED
接着看函数mmap_region(),其中包含如下的代码片段:
vma->vm_flags = vm_flags;
vma->vm_page_prot = vm_get_page_prot(vm_flags);
............
if (vma_wants_writenotify(vma))
vma->vm_page_prot = vm_get_page_prot(vm_flags & ~VM_SHARED);
首先看看其中调用的函数vma_wants_writenotify(),其开头部分如下:- /*
- * Some shared mappigns will want the pages marked read-only
- * to track write events. If so, we'll downgrade vm_page_prot
- * to the private version (using protection_map[] without the
- * VM_SHARED bit).
- */
- int vma_wants_writenotify(struct vm_area_struct *vma)
- {
- unsigned int vm_flags = vma->vm_flags;
- /* If it was private or non-writable, the write bit is already clear */
- if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
- return 0;
- /* The backer wishes to know when pages are first written to? */
- if (vma->vm_ops && vma->vm_ops->page_mkwrite)
- return 1;
复制代码 所以对于xfs,函数vma_wants_writenotify()返回1。
再看看函数vm_get_page_prot()的定义:- /* description of effects of mapping type and prot in current implementation.
- * this is due to the limited x86 page protection hardware. The expected
- * behavior is in parens:
- *
- * map_type prot
- * PROT_NONE PROT_READ PROT_WRITE PROT_EXEC
- * MAP_SHARED r: (no) no r: (yes) yes r: (no) yes r: (no) yes
- * w: (no) no w: (no) no w: (yes) yes w: (no) no
- * x: (no) no x: (no) yes x: (no) yes x: (yes) yes
- *
- * MAP_PRIVATE r: (no) no r: (yes) yes r: (no) yes r: (no) yes
- * w: (no) no w: (no) no w: (copy) copy w: (no) no
- * x: (no) no x: (no) yes x: (no) yes x: (yes) yes
- *
- */
- pgprot_t protection_map[16] = {
- __P000, __P001, __P010, __P011, __P100, __P101, __P110, __P111,
- __S000, __S001, __S010, __S011, __S100, __S101, __S110, __S111
- };
- pgprot_t vm_get_page_prot(unsigned long vm_flags)
- {
- return __pgprot(pgprot_val(protection_map[vm_flags &
- (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
- pgprot_val(arch_vm_get_page_prot(vm_flags)));
- }
复制代码 上面去掉VM_SHARED后,函数vm_get_page_prot()应该返回__P111。
对于X86,__P000等的定义如下:- /* xwr */
- #define __P000 PAGE_NONE
- #define __P001 PAGE_READONLY
- #define __P010 PAGE_COPY
- #define __P011 PAGE_COPY
- #define __P100 PAGE_READONLY_EXEC
- #define __P101 PAGE_READONLY_EXEC
- #define __P110 PAGE_COPY_EXEC
- #define __P111 PAGE_COPY_EXEC
- #define __S000 PAGE_NONE
- #define __S001 PAGE_READONLY
- #define __S010 PAGE_SHARED
- #define __S011 PAGE_SHARED
- #define __S100 PAGE_READONLY_EXEC
- #define __S101 PAGE_READONLY_EXEC
- #define __S110 PAGE_SHARED_EXEC
- #define __S111 PAGE_SHARED_EXEC
复制代码 即这里返回的是PAGE_COPY_EXEC,即- #define PAGE_COPY_EXEC __pgprot(_PAGE_PRESENT | _PAGE_USER | \
- _PAGE_ACCESSED)
复制代码 也就是页是存在的但是不可写的,写将导致页写保护,即调用到函数do_wp_page()。
函数do_wp_page()有下面的代码片段:- } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
- (VM_WRITE|VM_SHARED))) {
- ............
- tmp = vma->vm_ops->page_mkwrite(vma, &vmf);
- ...........
- reuse = 1;
- }
- .............
- if (reuse) {
- reuse:
- flush_cache_page(vma, address, pte_pfn(orig_pte));
- entry = pte_mkyoung(orig_pte);
- entry = maybe_mkwrite(pte_mkdirty(entry), vma);
- if (ptep_set_access_flags(vma, address, page_table, entry,1))
- update_mmu_cache(vma, address, entry);
- ret |= VM_FAULT_WRITE;
- goto unlock;
- }
- /*
- * Ok, we need to copy. Oh, well..
- */
- ....................
- new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
- ....................
- unlock:
- pte_unmap_unlock(page_table, ptl);
复制代码 这里通过page_mkwrite再次将页面设置为脏的,但是没有分配新的页面,
因此,写保护仍然是写保护,如果继续memcpy仍然会执行函数do_wp_page()。
总之,通过写保护实现了对mmap映射的memcpy追踪。
|
|