page_launder()函数疑问？

_nosay 发表于 2016-02-22 16:31

本帖最后由 _nosay 于 2016-02-22 17:30 编辑

Linux-2.4.0源码中的page_launder()函数：
我开始以为这个函数分2遍扫描，是在利用“第二次机会算法”，但看来不是，因为页面既然都到了inactive_dirty_list链表，就相当于被判了“死刑”，必须接受“死刑”往交换区写一次，才能变干净，也就没必要像“第二次机会算法”那样排个“你先死我后死”了。所以我猜测“第二次机会算法”应该是用于挑选将要移到inactive_dirty_list链表的页面，而不是用于挑选从inactive_dirty_list链表往交换区写的页面。

这个函数的注释里面有这样一句话：
Since we want to refill those pages as soon as possible, we'll make two loops over the inactive list, one to move the already cleaned pages to the inactive_clean lists and one to (often asynchronously) clean the dirty inactive pages.

如果第1遍扫描之后，空闲页面还是短缺，这个函数会执行第2遍扫描，将链表中的页面全部写到交换区（即洗成干净页面），以后的一段时间，再执行到该函数，一般第1遍扫描就能解决空闲页面短缺的问题，而且只是将上次第2遍扫描过程洗干净的页面直接转移到page->zone->inactive_clean_list链表即可，所以保证了这个函数大部分时间执行很快。

这个函数我看了好久，现在终于感觉有点理解了，请问是我理解的这样吗{:qq25:} ？

即使是这样，我还有疑问：
① 这种牺牲1次，提高大多次，好处体现在哪？反而有个缺点就是：在第2遍扫描时，如果inactive_dirty_list链表很长，内核不是会出现“卡顿”的现象吗？而且页面在链表中移来移去的，从整体上讲，性能也是有损失的（除非内核能保证，第2遍扫描大多数在CPU空闲的时候执行）。
② 第2遍扫描只是将链表中的页面都写到交换区，并没有顺便移到page->zone->inactive_clean_list链表，表示本次执行page_launder()洗净的页面数量cleaned_pages仍然是第1遍扫描的结果，这样不是可能会让后续的代码，由于洗净的页面还是满足不了需求而多走一些“弯路”吗？
③ 按照逻辑，难道不是干净页面在inactive_dirty_list链表最多只可能有1段区域，并且肯定在最前面吗？所以第1遍扫描如果遇到脏页面了，后面肯定都是脏页面，这样依次都移到inactive_dirty_list链表尾部，不是绕个圈又变成原来的形状了吗？而且第2遍扫描是将链表中剩下的页面全部洗干净，也不是从前往后扫一部分就不扫了，那即使“躲”到后面，又有什么意义呢，更别说由于大家都想“躲”，而造成恢复原来的队形了。

Buddy_Zhang1 发表于 2016-02-22 17:29

想帮您看看,可是我的内核代码里已经没有 page_launder() 这个函数了,楼主可以把这个函数贴出来看看吗....

_nosay 发表于 2016-02-22 17:33

回复 2# Buddy_Zhang1

我最近这样往论坛发代码，不知道会不会被公司抓起来{:qq25:} ！我可是冒了好大危险的，你一定要帮我好好想想呀。
/**
* page_launder - clean dirty inactive pages, move to inactive_clean list
* @gfp_mask: what operations we are allowed to do
* @sync: should we wait synchronously for the cleaning of pages
*
* When this function is called, we are most likely low on free +
* inactive_clean pages. Since we want to refill those pages as
* soon as possible, we'll make two loops over the inactive list,
* one to move the already cleaned pages to the inactive_clean lists
* and one to (often asynchronously) clean the dirty inactive pages.
*
* In situations where kswapd cannot keep up, user processes will
* end up calling this function. Since the user process needs to
* have a page before it can continue with its allocation, we'll
* do synchronous page flushing in that case.
*
* This code is heavily inspired by the FreeBSD source code. Thanks
* go out to Matthew Dillon.
*/
#define MAX_LAUNDER (4 * (1 << page_cluster))
int page_launder(int gfp_mask, int sync)
{
int launder_loop, maxscan, cleaned_pages, maxlaunder;
int can_get_io_locks;
struct list_head * page_lru;
struct page * page;

/*
* We can only grab the IO locks (eg. for flushing dirty
* buffers to disk) if __GFP_IO is set.
*/
can_get_io_locks = gfp_mask & __GFP_IO;

launder_loop = 0;
maxlaunder = 0;
cleaned_pages = 0;

dirty_page_rescan:
spin_lock(&pagemap_lru_lock);
maxscan = nr_inactive_dirty_pages;
while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list &&
maxscan-- > 0) {
page = list_entry(page_lru, struct page, lru);

/* Wrong page on list?! (list corruption, should not happen) */
if (!PageInactiveDirty(page)) {
printk("VM: page_launder, wrong page on list.\n");
list_del(page_lru);
nr_inactive_dirty_pages--;
page->zone->inactive_dirty_pages--;
continue;
}

/* Page is or was in use?Move it to the active list. */
if (PageTestandClearReferenced(page) || page->age > 0 ||
(!page->buffers && page_count(page) > 1) ||
page_ramdisk(page)) {
del_page_from_inactive_dirty_list(page);
add_page_to_active_list(page);
continue;
}

/*
* The page is locked. IO in progress?
* Move it to the back of the list.
*/
if (TryLockPage(page)) {
list_del(page_lru);
list_add(page_lru, &inactive_dirty_list);
continue;
}

/*
* Dirty swap-cache page? Write it out if
* last copy..
*/
if (PageDirty(page)) {
int (*writepage)(struct page *) = page->mapping->a_ops->writepage;
int result;

if (!writepage)
goto page_active;

/* First time through? Move it to the back of the list */
if (!launder_loop) {
list_del(page_lru);
list_add(page_lru, &inactive_dirty_list);
UnlockPage(page);
continue;
}

/* OK, do a physical asynchronous write to swap.*/
ClearPageDirty(page);
page_cache_get(page);
spin_unlock(&pagemap_lru_lock);

result = writepage(page);
page_cache_release(page);

/* And re-start the thing.. */
spin_lock(&pagemap_lru_lock);
if (result != 1)
continue;
/* writepage refused to do anything */
set_page_dirty(page);
goto page_active;
}

/*
* If the page has buffers, try to free the buffer mappings
* associated with this page. If we succeed we either free
* the page (in case it was a buffercache only page) or we
* move the page to the inactive_clean list.
*
* On the first round, we should free all previously cleaned
* buffer pages
*/
if (page->buffers) {
int wait, clearedbuf;
int freed_page = 0;
/*
* Since we might be doing disk IO, we have to
* drop the spinlock and take an extra reference
* on the page so it doesn't go away from under us.
*/
del_page_from_inactive_dirty_list(page);
page_cache_get(page);
spin_unlock(&pagemap_lru_lock);

/* Will we do (asynchronous) IO? */
if (launder_loop && maxlaunder == 0 && sync)
wait = 2; /* Synchrounous IO */
else if (launder_loop && maxlaunder-- > 0)
wait = 1; /* Async IO */
else
wait = 0; /* No IO */

/* Try to free the page buffers. */
clearedbuf = try_to_free_buffers(page, wait);

/*
* Re-take the spinlock. Note that we cannot
* unlock the page yet since we're still
* accessing the page_struct here...
*/
spin_lock(&pagemap_lru_lock);

/* The buffers were not freed. */
if (!clearedbuf) {
add_page_to_inactive_dirty_list(page);

/* The page was only in the buffer cache. */
} else if (!page->mapping) {
atomic_dec(&buffermem_pages);
freed_page = 1;
cleaned_pages++;

/* The page has more users besides the cache and us. */
} else if (page_count(page) > 2) {
add_page_to_active_list(page);

/* OK, we "created" a freeable page. */
} else /* page->mapping && page_count(page) == 2 */ {
add_page_to_inactive_clean_list(page);
cleaned_pages++;
}

/*
* Unlock the page and drop the extra reference.
* We can only do it here because we ar accessing
* the page struct above.
*/
UnlockPage(page);
page_cache_release(page);

/*
* If we're freeing buffer cache pages, stop when
* we've got enough free memory.
*/
if (freed_page && !free_shortage())
break;
continue;
} else if (page->mapping && !PageDirty(page)) {
/*
* If a page had an extra reference in
* deactivate_page(), we will find it here.
* Now the page is really freeable, so we
* move it to the inactive_clean list.
*/
del_page_from_inactive_dirty_list(page);
add_page_to_inactive_clean_list(page);
UnlockPage(page);
cleaned_pages++;
} else {
page_active:
/*
* OK, we don't know what to do with the page.
* It's no use keeping it here, so we move it to
* the active list.
*/
del_page_from_inactive_dirty_list(page);
add_page_to_active_list(page);
UnlockPage(page);
}
}
spin_unlock(&pagemap_lru_lock);

/*
* If we don't have enough free pages, we loop back once
* to queue the dirty pages for writeout. When we were called
* by a user process (that /needs/ a free page) and we didn't
* free anything yet, we wait synchronously on the writeout of
* MAX_SYNC_LAUNDER pages.
*
* We also wake up bdflush, since bdflush should, under most
* loads, flush out the dirty pages before we have to wait on
* IO.
*/
if (can_get_io_locks && !launder_loop && free_shortage()) {
launder_loop = 1;
/* If we cleaned pages, never do synchronous IO. */
if (cleaned_pages)
sync = 0;
/* We only do a few "out of order" flushes. */
maxlaunder = MAX_LAUNDER;
/* Kflushd takes care of the rest. */
wakeup_bdflush(0);
goto dirty_page_rescan;
}

/* Return the number of pages moved to the inactive_clean list. */
return cleaned_pages;
}

_nosay 发表于 2016-02-22 18:04

回复 2# Buddy_Zhang1

书上面说，这个函数是由多个线程执行的。

nswcfd 发表于 2016-02-23 10:52

如果不是私有代码，引用公开的lxr就可以了，不过现在2.4版本的应该不多了。

_nosay 发表于 2016-02-23 14:07

回复 5# nswcfd

{:qq13:} 我看的是《Linux内核源代码情景分析》，它对应的是2.4.0版本。

页: [1]

Chinaunix's Archiver

page_launder()函数疑问？