免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3055 | 回复: 0
打印 上一主题 下一主题

[新手入门] Anatomy of the Linux slab allocator [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2007-10-16 11:24 |只看该作者 |倒序浏览

Anatomy of the Linux slab allocator
Learn how Linux manages memory




Document options



Print this page



E-mail this page


Document options requiring JavaScript are not displayed
Watch these demos


Integrate new tools and architectures into your environment - fast!
-->
Rate this page


Help us improve this content
Level: Intermediate
M. Tim Jones
(
[email=mtj@mtjones.com?subject=Anatomy of the Linux slab allocator&cc=tomyoung@us.ibm.com]mtj@mtjones.com[/email]
), Consultant Engineer, Emulex Corp.
15 May 2007
Good operating system performance depends in part on the operating system's ability to efficiently manage resources. In the old days, heap memory managers were the norm, but performance suffered due to fragmentation and the need for memory reclamation. Today, the Linux® kernel uses a method that originated in Solaris but has been used in embedded systems for quite some time, allocating memory as objects based on their size. This article explores the ideas behind the slab allocator and examines its interfaces and their use.
Dynamic memory management
The goal of memory management is to provide a method by which memory can be dynamically shared amongst a variety of users for a variety of purposes. The memory management method should do both of the following:

  • Minimize the amount of time required to manage the memory
  • Maximize the available memory for general usage (minimize management overhead)

Memory management is ultimately a zero-sum game of tradeoffs. You can develop an algorithm that uses little memory for management but takes more time to manage the available memory. You can also develop an algorithm that efficiently manages memory but uses a bit more memory. In the end, the requirements for the particular application drive the balance of the tradeoffs.
Early memory managers used a heap-based allocation strategy. In this method, a large block of memory (called the heap) is used to provide memory for user-defined purposes. When users need a block of memory, they make a request for a given size. The heap manager looks at the available memory (using a particular algorithm) and returns the block. Some of the algorithms used in this search are the first-fit (the first block encountered in the heap that satisfies the request), and the best-fit (the best block in the heap that satisfies the request). When the users are finished with the memory, they return it to the heap.
The fundamental problem with this heap-based allocation strategy is fragmentation. As blocks of memory are allocated, they are returned in different orders and at different times. This tends to leave holes in the heap requiring more time to efficiently manage the free memory. This algorithm tends to be memory efficient (allocating what's necessary) but requires more time to manage the heap.
Another approach, called buddy memory allocation, is a faster memory technique that divides memory into power-of-2 partitions and attempts to allocate memory requests using a best-fit approach. When memory is freed by the user, the buddy block is checked to see if any of its contiguous neighbors have also been freed. If so, the blocks are combined to minimize fragmentation. This algorithm tends to be a bit more time efficient but can waste memory due to the best-fit approach.
This article focuses on Linux kernel memory management and, in particular, the mechanisms provided through slab allocation.
The slab cache
The slab allocator used in Linux is based on an algorithm first introduced by Jeff Bonwick for the SunOS operating system. Jeff's allocator revolves around object caching. Within a kernel, a considerable amount of memory is allocated for a finite set of objects such as file descriptors and other common structures. Jeff found that the amount of time required to initialize a regular object in the kernel exceeded the amount of time required to allocate and deallocate it. His conclusion was that instead of freeing the memory back to a global pool, he would have the memory remain initialized for its intended purpose. For example, if memory is being allocated for a mutex, the mutex initialization function (mutex_init) need only be performed once when the memory is first allocated for the mutex. Subsequent allocations of the memory need not perform the initialization because it's already in the desired state from the previous deallocation and call to the deconstructor.
The Linux slab allocator uses these ideas and others to build a memory allocator that is efficient in both space and time.
Figure 1 illustrates the high-level organization of the slab structures. At the highest level is the cache_chain, which is a linked list of the slab caches. This is useful for best-fit algorithms that look for a cache that most closely fits the size of the desired allocation (iterating the list). Each element of the cache_chain is a kmem_cache structure reference (called a cache). This defines a pool of objects of a given size to manage.
Figure 1. The major structures of the slab allocator


Each cache contains a list of slabs, which are contiguous blocks of memory (typically pages). Three slabs exist:
slabs_full
Slabs that are fully allocated.
slabs_partial
Slabs that are partially allocated.
slabs_empty
Slabs that are empty, or no objects allocated.
Note that the slabs on the slabs_empty list are prime candidates for reaping. This is the process by which the memory used by slabs is provided back to the operating system for other uses.
Each slab in the slab list is a contiguous block of memory (one or more contiguous pages) that is divided into objects. These objects are the fundamental elements that are allocated and deallocated from the particular cache. Note that the slab is the smallest allocation to the slab allocator, so if it needs to grow, this is the minimum by which it will grow. Typically, numerous objects are allocated per slab.
As objects are allocated and deallocated from a slab, the individual slab can move between the slab lists. For example, when all objects are consumed in a slab, it moves from the slabs_partial list to the slabs_full list. When a slab is full and an object is deallocated, it moves from the slabs_full list to the slabs_partial list. When all objects are deallocated, it moves from the slabs_partial list to the slabs_empty list.
Motivation behind the slab
The slab cache allocator provides a number of benefits over traditional memory management schemes. First, kernels commonly rely on the allocation of small objects that are allocated numerous times over the lifetime of the system. The slab cache allocator provides this through the caching of similarly sized objects, thus avoiding the fragmentation problems that commonly occur. The slab allocator also supports the initialization of common objects, thus avoiding the need to repeatedly initialize an object for the same purpose. Finally, the slab allocator supports hardware cache alignment and coloring, which allows objects in different caches to occupy the same cache lines for increased cache utilization and better performance.


Back to top
API functions
Now on to the application program interface (API) for creating new slab caches, adding memory to the caches, destroying the caches, as well as the functions to allocate and deallocate objects from them.
The first step is to create your slab cache structure, which you can create statically as:
struct struct kmem_cache *my_cachep;

Linux source for the slab cache
You can find the slab cache source in ./linux/mm/slab.c. The kmem_cache structure is also defined in ./linux/mm/slab.c. The discussion in this article focuses on the current implementation in the 2.6.21 Linux kernel.
This reference is then used by the other slab cache functions for creation, deletion, allocation, and so on. The kmem_cache structure contains the per-central processing unit (CPU) data, a set of tunables (accessible through the proc file system), statistics, and necessary elements to manage the slab cache.
kmem_cache_create
Kernel function kmem_cache_create is used to create a new cache. This is commonly performed at kernel init time or when a kernel module is first loaded. Its prototype is defined as:
struct kmem_cache *
kmem_cache_create( const char *name, size_t size, size_t align,
                       unsigned long flags;
                       void (*ctor)(void*, struct kmem_cache *, unsigned long),
                       void (*dtor)(void*, struct kmem_cache *, unsigned long));
The name argument defines the name of the cache, which is used by the proc file system (in /proc/slabinfo) to identify this cache. The size argument specifies the size of the objects that should be created for this cache, with the align argument defining the required alignment for each object. The flags argument specifies options to enable for the cache. These flags are shown in Table 1.
Table 1. Partial list of options for kmem_cache_create (as specified in flags)
Option
Description
SLAB_RED_ZONE
Insert markers at header and trailer of the object to support checking of buffer overruns.
SLAB_POISON
Fill a slab with a known pattern to allow monitoring of objects in the cache (objects owned by the cache, but modified externally).
SLAB_HWCACHE_ALIGN
Specify that the objects in this cache must be aligned to the hardware cachline.
The ctor and dtor arguments define an optional object constructor and deconstructor. The constructor and deconstructor are callback functions provided by the user. When a new object is allocated from the cache, it can be initialized through the constructor.
After the cache is created, its reference is returned by the kmem_cache_create function. Note that this function allocates no memory to the cache. Instead, when attempts are made to allocate objects from the cache (when it's initially empty), memory is allocated to it through a refill operation. This same operation is used to add memory to the cache when all its objects are consumed.
kmem_cache_destroy
Kernel function kmem_cache_destroy is used to destroy a cache. This call is performed by kernel modules when they are unloaded. The cache must be empty before this function is called.
void kmem_cache_destroy( struct kmem_cache *cachep );
kmem_cache_alloc
To allocate an object from a named cache, the kmem_cache_alloc function is used. The caller provides the cache from which to allocate an object and a set of flags:
void kmem_cache_alloc( struct kmem_cache *cachep, gfp_t flags );
This function returns an object from the cache. Note that if the cache is currently empty, this function may invoke cache_alloc_refill to add memory to the cache. The flag options for kmem_cache_alloc are the same as those for kmalloc. Table 2 provides a partial list of the flag options.
Table 2. Flag options for the kmem_cache_alloc and kmalloc kernel functions
Flag
Description
GFP_USER
Allocate memory for a User (this call may sleep).
GFP_KERNEL
Allocate memory from kernel RAM (this call may sleep).
GFP_ATOMIC
Force no-sleep on this call (useful for interrupt handlers).
GFP_HIGHUSER
Allocate from high memory.

Slab allocation for NUMA
For Non-Uniform Memory Access (NUMA) architectures, the allocation function for a specified node is kmem_cache_alloc_node.
kmem_cache_zalloc
The kernel function kmem_cache_zalloc is similar to kmem_cache_alloc, except that it performs a memset of the object to clear it out prior to returning the object to the caller.
kmem_cache_free
To free an object back to the slab, the kmem_cache_free is used. The caller provides the cache reference and the object to be freed.
void kmem_cache_free( struct kmem_cache *cachep, void *objp );
kmalloc and kfree
The most common memory management functions in the kernel are the kmalloc and kfree functions. The prototypes for these functions are defined as:
void *kmalloc( size_t size, int flags );
void kfree( const void *objp );
Note that in kmalloc the only arguments are the requested size of the object to allocate and a set of flags (see the partial list in
Table 2
). But kmalloc and kfree use the slab cache just like the previously-defined functions. Instead of naming a specific slab cache from which to allocate an object, the kmalloc function iterates through the available caches looking for one that can satisfy its size constraint. When found, an object is allocated (using __kmem_cache_alloc). To free an object with kfree, the cache from which the object was allocated is determined with a call to virt_to_cache. This function returns the cache reference that is then used in a call to __cache_free to release the object.

Generic object allocation
Internally in the slab source, a function named kmem_find_general_cachep is provided that performs a cache search looking for a slab cache that best fits the necessary object size.
Other miscellaneous functions
The slab cache API provides a number of other useful functions. The kmem_cache_size function returns the size of the objects that are managed by this cache. You can also retrieve the name of a given cache (as defined at cache creation time) through a call to kmem_cache_name. A cache can be shrunk by releasing free slabs in the cache. This can be accomplished with a call to kmem_cache_shrink. Note that this action (called reaping) is performed automatically on a periodic basis by the kernel (through kswapd).
unsigned int kmem_cache_size( struct kmem_cache *cachep );
const char *kmem_cache_name( struct kmem_cache *cachep );
int kmem_cache_shrink( struct kmem_cache *cachep );


Back to top
Example use of the slab cache
The following code snippets demonstrate the creation of a new slab cache, allocating and deallocating objects from the cache, and then destroying the cache. To begin, a kmem_cache object must be defined and then initialized (see Listing 1). This particular cache contains 32-byte objects and is hardware-cache aligned (as defined by the flags argument SLAB_HWCACHE_ALIGN).
Listing 1. Creating a new slab cache
               
static struct kmem_cache *my_cachep;
static void init_my_cache( void )
{
   my_cachep = kmem_cache_create(
                  "my_cache",            /* Name */
                  32,                    /* Object Size */
                  0,                     /* Alignment */
                  SLAB_HWCACHE_ALIGN,    /* Flags */
                  NULL, NULL );          /* Constructor/Deconstructor */
   return;
}
With your slab cache allocated, you can now allocate an object from it. Listing 2 provides an example of allocating and deallocating an object from the cache. It also demonstrates two of the miscellaneous functions.
Listing 2. Allocating and deallocating objects
               
int slab_test( void )
{
  void *object;
  printk( "Cache name is %s\n", kmem_cache_name( my_cachep ) );
  printk( "Cache object size is %d\n", kmem_cache_size( my_cachep ) );
  object = kmem_cache_alloc( my_cachep, GFP_KERNEL );
  if (object) {
    kmem_cache_free( my_cachep, object );
  }
  return 0;
}
Finally, Listing 3 is an example of destroying a slab cache. The caller must ensure that no attempts to allocate objects from the cache are performed during the destroy operation.
Listing 3. Destroying a slab cache
               
static void remove_my_cache( void )
{
  if (my_cachep) kmem_cache_destroy( my_cachep );
  return;
}


Back to top
Proc interface to the slab
The proc file system provides a simple way to monitor all slab caches that are active in the system. This file, called /proc/slabinfo, provides detailed information about all slab caches in addition to providing a few tunables that are accessible from user space. The current version of slabinfo provides a header so that the output is a bit more readable. For each slab cache in the system, the number of objects, number of active objects, and the object size is provided (in addition to the objects and pages per slab). A set of tunables and slab data are also provided.
To tune a particular slab cache, simply echo the slab cache name and the three tunable parameters as a string to the /proc/slabinfo file. The following example illustrates how to increase the limit and batchcount while leaving the shared factor as is (format is "cache name limit batchcount shared factor"):
# echo "my_cache 128 64 8" > /proc/slabinfo
            


本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/29128/showart_401474.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP