免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1400 | 回复: 0
打印 上一主题 下一主题

Linux Memory Mapping [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2010-02-08 11:44 |只看该作者 |倒序浏览

Linux Memory Mapping
Purpose
The following examples demonstrates how to map a driver
allocated buffer from kernel into user space. It has been tested with Linux
kernel 2.2.18 and 2.4.0 on Intel (uni processor only) and Alpha
platform (COMPAQ Personal Workstation 500au (uni processor), DS20 and ES40 (SMP).
Memory Mappings
The Linux kernel works with different memory mappings.
One mapping, called kernel virtual mapping provides a direct 1 to 1 mapping of
physical addresses to virtual addresses. Conversion between physical and virtual
addresses can be made with phys_to_virt() and virt_to_phys() for the translation
virtual to physical.
The translation from IO bus addresses into kernel
virtual addresses uses also the kernel virutal mapping. An contiguous address area in
the kernel segment is also contigous in physical memory.
Memory Allocation in the Kernel
kmalloc() returns a memory area in the
kernel virtual mapping. Therefore, the area is physical contigous, and can be translated to a
physical address with virt_to_phys() or to a IO bus address with
virt_to_bus().
vmalloc() creates a new memory area, puts several physically
non-contigous pages into the area and enters the new area into the memory map
of the kernel. Such
addresses cannot be converted into physical or IO bus addresses with the
functions described above.
Translation Virtual to Kernel Virtual Address
Before translating into a
physical address or into a IO bus address, a general kernel virtual address, e.g.
returned by vmalloc(), must be converted to a kernel virtual address.
This can be achieved e.g. with the following steps:
  • Check whether the address is already in the kernel virtual mapping
    On Linux 2.2.x the direct kernel virtual mapping starts just at the
    begin of the kernels memory space (i.e. starting at PAGE_OFFSET). The
    pageframe number can be derived using the MAP_NR macro. The variable
    max_mapnr identifies the highest available pageframe. Therefore if
    MAP_NR(address) is smaller than max_mapnr the address is part of the
    direct kernel virtual mapping:
    if (MAP_NR(address)
    Linux > 2.2.x supports more physical memory than the virtual address
    space can cover, and supports physical memory layouts with holes
    in it. Therefore we cannot just compare with a maximum pageframe number.
    In our example we treat addresses already in the direct kernel virtual
    mapping the same way as other pages.

  • Linux has a three level page table. We need to make three lookups therefore:
              
    • page directory
                                For the lookup of the page directory we need to know which
                              memory map we want to look at and the address. For a general
                              lookup we can use pgd_offset(), for a lookup
                              in the kernel memory map we use the macro pgd_offset_k
    • page middle directory
                          With the page directory pointer we lookup the page middle directory.
                              Before doing so, we need to validate whether the page directory
                              has a valid entry. Assuming pgd is our page directory
                              pointer, we can check the entry with:
                              if (!pgd_none(*pgd))
                              The lookup finally is done with pmd = pmd_offset(pgd, address)
                        
    • page table
                          The page middle directory pointer needs to be checked as well. The
                              lookup is done with ptep = pte_offset(pmd, address)
                        

  • ptep contains now a pointer to a page table entry. Whether
    the entry contains a page can be checked with pte_present(*ptep)
  • On Linux 2.2.x pte_page(*ptep) returns us the kernel virtual
    address of the corresponding page table entry.
    On Linux 2.4.x pte_page(*ptep) returns us the pointer to a page
    structure. The kernel virtual address of the page is read out of the page structure
    with page_address()

Note: the function performs the translation for the *one* page, where
address is in. Since vmalloc'd areas do not have to be physical contigous, the
next page may have a complete different offset!
Note: the parsing of the page table as described above works when they do
not change during the parsing. For memory areas allocated with vmalloc()
this is the case. If you want to translate an address belong to a process which
can get swapped out, you need to protect the code with the corresponding locks.
See e.g. sys_mlock() how this can be done.
Mapping Kernel Virtual Addresses into User Space
2.2 and 2.4 Kernel
Mapping addresses which are in
the kernel virtual mapping into user space is straight foreward:
  • Implement a mmap method for your driver
  • Set the reserved bit on the pages you are mapping into user space
      
  • set the VM_LOCKED flag on your memory area. This will
              prevent the swapper fromm swapping the pages out:
            vma->flags |= VM_LOCKED
      
  • Call remap_page_range() to map the physical address of your buffer to the
      user space address
Example:
    vma->flags |= VM_LOCKED;
  if (remap_page_range(vma->vm_start,
                     virt_to_phys((void*)((unsigned long)kmalloc_area)),
                     size,
                     PAGE_SHARED))
{
     printk("remap page range failed\n");
     return -ENXIO;
}
Note: on Linux 2.4.x remap_page_range() needs to be called with the
mm semaphore hold. The semaphore is grabed by the syscall, so within
your mmap method you are safe. If you call remap_page_range in
other contexts, you need to grab the semaphore first (e.g.
down(&current->mm->mmap_sem)).
2.6 Kernel
On 2.6 things got even simpler. The remap_pfn_range function sets the
correct flags in the vm_area. remap_pfn_range can be called for
a set of physically contiguous pages. Do map the pages you therefore have to:
  • Implement a mmap method for your driver
             
  • Set the reserved bit on the pages you are mapping into user space
           
  • Call remap_pfn_range to map your buffer into user space

Example:
        if (remap_pfn_range(vma,
                            vma->vm_start,
                            virt_to_phys((void *)kmalloc_area) >> PAGE_SHIFT,
                            size,
                            vma->vm_page_prot))
The arguments of the remap_pfn_range function are:
  • vma: vm_area_struct has passed to the mmap method
           
  • vma->vm_start: start of mapping user address space
           
  • virt_to_phys((void *)kmalloc_area) >> PAGE_SHIFT: page frame number of first page
           
  • size: length of mapping in bytes
           
  • vma->>vm_page_prot: protection bits, here passed along received from application

Mapping non-kernel Virtual Addresses into User Space
Mapping addresses e.g.
returned by vmalloc() into user space is a little bit more tricky, since each
page has a different address translation.
A very elegant method to create
such mappings is the usage of the nopage method of the virtual memory area
functions. The methods are attached to the virtual memory area in the mmap
method of the device. Each time than the user space process accesses a page that
has not yet been translated, a page fault occurs and our nopage handler is
called. The nopage handler has to increment the usage count of the page.
On Linux 2.2.x it has to return the kernel virtual address of the page the application wants to
access, on Linux 2.4.x it has to return the pointer to the page structure.
Example for a nopage handler:
Note: virt_to_kseg() is an implementation of the function described
above to parse the page table.
/* page fault handler (for Linux 2.2.x) */
unsigned long mmap_drv_vmmap(struct vm_area_struct *vma, unsigned long address, int no_share)
{
        unsigned long offset;
        unsigned long virt_addr;
         /* determine the offset within the vmalloc'd area  */
        offset = address - vma->vm_start + vma->vm_offset;
        /* calculate the kseg virtual address */
        virt_addr = (unsigned long)virt_to_kseg(&vmalloc_area[offset/sizeof(int)]);
        /* increment the usage count of the page */
        atomic_inc(&mem_map[MAP_NR(virt_addr)].count);
        
        printk("mmap_drv: page fault for offset 0x%lx (kseg x%lx)\n",
               offset, virt_addr);
        /* return the kseg virtual address, *not* the physical address as stated
           in some wrong examples.
        */
        return(virt_addr);
}
2.6 Kernel
On 2.6 there is no need for a driver specific page fault handler since remap_pfn_range
can be called for every page individually. To map a vmalloc'd area you simply have
to loop over all pages and call remap_pfn_range:
        while (length > 0) {
                pfn = vmalloc_to_pfn(vmalloc_area_ptr);
                if ((ret = remap_pfn_range(vma, start, pfn, PAGE_SIZE,
                                           PAGE_SHARED))
Setting the Reserved Bit
Before a page can be exported into user space,
the reserved bit must be set. This is done on Linux 2.2.x with e.g.:
mem_map_reserve(MAP_NR(virt_to_kseg((void *)virt_addr)))
Note: mem_map_reserve() (and its counterpart mem_map_unreserve()) take the
map number of the page as argument. The map number is calculated out of the kernel
virtual address with the MAP_NR() macro.
On Linux 2.4.x mem_map_reserve() takes a pointer to a page
structure as argument. The page structure pointer is derived from the
kernel virtual address with virt_to_page().
Putting the Parts together
The example below shows a device driver, that
allocates two memory area: one with vmalloc(), the other with kmalloc(). It
implements both mapping methods described above to export the memory to user
space.
Please read the explanations in the example program source code on how to
run the test program.
Linux 2.6 Device Driver
With Linux 2.6 a new build process is used. This and the amount of changes
made me split the example from the 2.2 and 2.4 code. Please download it
from
here
.
A pure BSD licensed version of the code can be fetched from
here
.
Linux 2.2 and 2.4 Device Driver
The example has been tested with Linux 2.2.18 and 2.4.0, on Intel and Alpha
platform.
(File mmap_drv.c)
#include
#include
#include
#if defined(MODVERSIONS)
#include
#endif
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define LEN (64*1024)
/* device open */
int mmapdrv_open(struct inode *inode, struct file *file);
/* device close */
int mmapdrv_release(struct inode *inode, struct file *file);
/* device mmap */
int mmapdrv_mmap(struct file *file, struct vm_area_struct *vma);
/* open handler for vm area */
void mmap_drv_vopen(struct vm_area_struct *vma);
/* close handler form vm area */
void mmap_drv_vclose(struct vm_area_struct *vma);
/* page fault handler for callback of vmalloc area */
#if LINUX_VERSION_CODE = KERNEL_VERSION(2,4,0)
  owner:   THIS_MODULE,
#endif
  mmap:    mmapdrv_mmap,
  open:    mmapdrv_open,
  release: mmapdrv_release,
};
/* memory handler functions */
static struct vm_operations_struct mmap_drv_vm_ops = {
  open:    mmap_drv_vopen, /* mmap-open */
  close:  mmap_drv_vclose,/* mmap-close */
  nopage: mmap_drv_vmmap, /* no-page fault handler */
};
/* pointer to page aligned area */
static int *vmalloc_area = NULL;
/* pointer to unaligend area */
static int *vmalloc_ptr  = NULL;
/* pointer to page aligned area */
static int *kmalloc_area = NULL;
/* pointer to unaligned area */
static int *kmalloc_ptr = NULL;
/* major number of device */
static int major;
#if LINUX_VERSION_CODE vm_offset;
#else
        unsigned long offset = vma->vm_pgoffvm_end - vma->vm_start;
        
        if (offset & ~PAGE_MASK)
        {
                printk("offset not aligned: %ld\n", offset);
                return -ENXIO;
        }
        
        if (size>LEN)
        {
                printk("size too big\n");
                return(-ENXIO);
        }
        
        /* we only support shared mappings. Copy on write mappings are
           rejected here. A shared mapping that is writeable must have the
           shared flag set.
        */
        if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED))
        {
             printk("writeable mappings must be shared, rejecting\n");
             return(-EINVAL);
        }
        /* we do not want to have this area swapped out, lock it */
        vma->vm_flags |= VM_LOCKED;
        
        /* there are two different mapping options implemented here:
           for the virtual contiguous memory area, we install a page fault handler.
           The page fault handler calculates the right physical page on first
           access of the application to the page.
           (method 1 is used for vmalloc'd memory, offset 0..LEN)
           The second way works only for a physical contigous range of pages:
           we create a mapping between the physical pages and the virtual
           addresses of the application with remap_page_range.
           (method 2 is used for kmalloc'd memory, offset LEN..2*LEN)
        */
        if (offset == 0)
        {
                /* method 1: install a page handler */
                vma->vm_ops = &mmap_drv_vm_ops;
                /* call the open routine to increment the usage count */
                mmap_drv_vopen(vma);
        } else if (offset == LEN)
        {
                /* method 2: enter pages into mapping of application */
                if (remap_page_range(vma->vm_start,
                                     virt_to_phys((void*)((unsigned long)kmalloc_area)),
                                     size,
                                     PAGE_SHARED))
                {
                        printk("remap page range failed\n");
                        return -ENXIO;
                }
        } else
        {
                printk("offset out of range\n");
                return -ENXIO;
        }
        return(0);
}
/* open handler for vm area */
void mmap_drv_vopen(struct vm_area_struct *vma)
{
        /* needed to prevent the unloading of the module while
           somebody still has memory mapped */
        MOD_INC_USE_COUNT;
}
/* close handler form vm area */
void mmap_drv_vclose(struct vm_area_struct *vma)
{
        MOD_DEC_USE_COUNT;
}
/* page fault handler */
#if LINUX_VERSION_CODE vm_start + vma->vm_offset;
#else
        offset = address - vma->vm_start + (vma->vm_pgoffcount));
#endif
        
        printk("mmap_drv: page fault for offset 0x%lx (kseg x%lx)\n",
               offset, virt_addr);
#if LINUX_VERSION_CODE
Test Application
(File mmap.c)
#include
#include
#include
#include
#include
#include
#define LEN (64*1024)
/* this is a test program that opens the mmap_drv.
   It reads out values of the kmalloc() and vmalloc()
   allocated areas and checks for correctness.
   You need a device special file to access the driver.
   The device special file is called 'node' and searched
   in the current directory.
   To create it
   - load the driver
     'insmod mmap_mod.o'
   - find the major number assigned to the driver
     'grep mmapdrv /proc/devices'
   - and create the special file (assuming major number 254)
     'mknod node c 254 0'
*/
int main(void)
{
  int fd;
  unsigned int *vadr;
  unsigned int *kadr;
  if ((fd=open("node", O_RDWR))
Makefile
When copy-pasting this makefile, remember the tabs on the start of the line!
Edit the first line of the makefile to adjust to your kernel source tree.
You need to configure the kernel tree (e.g. make config) before to have
a .config file created an have the symbolic links set up right.
# set to your kernel tree
KERNEL  = /usr/src/linux-2.4.0
#KERNEL  = /usr/src/linux-2.2.18
# get the Linux architecture. Needed to find proper include file for CFLAGS
ARCH=$(shell uname -m | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ -e s/arm.*/arm/ -e s/sa110/arm/)
# set default flags to compile module
CFLAGS = -D__KERNEL__ -DMODULE -I$(KERNEL)/include
CFLAGS+= -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing
all:        mmap_mod.o mmap
# get configuration of kernel
include $(KERNEL)/.config
# modify CFLAGS with architecture specific flags
include $(KERNEL)/arch/${ARCH}/Makefile
# enable the module versions, if configured in kernel source tree
ifdef CONFIG_MODVERSIONS
CFLAGS+= -DMODVERSIONS -include $(KERNEL)/include/linux/modversions.h
endif
# enable SMP, if configured in kernel source tree
ifdef CONFIG_SMP
CFLAGS+= -D__SMP__
endif
# note: we are compiling the driver object file and then linking
# we link it into the module. With just one object file as in
# this example this is not needed. We can just load the object
# file produced by gcc
# link the mmap driver module
mmap_mod.o:        mmap_drv.o
        ld -r -o mmap_mod.o mmap_drv.o
# compile the mmap driver
mmap_drv.o:        mmap_drv.c
        gcc $(CFLAGS) -c mmap_drv.c
# compile and link the test program
mmap:        mmap.c
        gcc -o mmap mmap.c
clean:
        rm -f *.o mmap
Comments, Corrections
Please send comments, corrections etc. to the
address below.


frey@scs.ch
               
               
               

本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u2/86301/showart_2179906.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP