免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
12下一页
最近访问板块 发新帖
查看: 8988 | 回复: 14
打印 上一主题 下一主题

[内存管理] 关于pagecache的来两个问题 [复制链接]

论坛徽章:
1
巨蟹座
日期:2014-04-23 23:20:17
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2014-04-10 23:59 |只看该作者 |倒序浏览
1、page cache存储在用户 还是 内核空间?
如果在用户空间,不同进程看到的pagecache是同一份吗?
如果在内核空间pagecache一般占多大?

2、为什么write的时候要从用户拷贝到内核?假如pagecache在内核空间,是为了更新page cache吗?
内核为什么不直接读取用户空间的数据?

希望大牛给出结论同时,能给些参考资料

论坛徽章:
3
双鱼座
日期:2013-09-04 19:47:39天蝎座
日期:2013-12-11 20:30:532015年亚洲杯之澳大利亚
日期:2015-04-20 00:28:02
2 [报告]
发表于 2014-04-11 00:31 |只看该作者


我也是新手.
1) page cache确实是内核空间.page cache,"页高速缓存",用来缓存在存储在慢速硬盘中文件的数据。
也就是说它是相对于文件而言,而不是进程而言. 文件在文件系统中通过inode来描述,
文件可以通过inode->i_mapping(address_space描述的是page cache)来得到Page cache.
进程打开文件: -->file descripter---->file -----> inode, 不管有多少个进程打开(open)同一个文件,最终都是会访问同一个inode的,不同的是fd和file.
不同进程看到同一个文件的page cache是一样的. 至于大小,这是可变的.address_sspace通过一个radix tree来管理page cache中的物理页。这棵树是可以变化的

2) write的操作不一定会触发实际的IO.通常你write的时候,数据从用户空间缓存在内核空间的page cache中,然后标记这个页是脏页,然后就直接返回了,所以有时候write比read
   的动作要快..
    实际写入到硬盘中的动作交由特定的内核线程bdflush来完成.内核线程会遍历所有文件的page cache,找到所有的脏页逐一进行IO,这是才实际更新硬盘中文件的数据.

参考:
ULK  12章 文件
        15章 page cache
        16 章

论坛徽章:
15
射手座
日期:2014-02-26 13:45:082015年迎新春徽章
日期:2015-03-04 09:54:452015年辞旧岁徽章
日期:2015-03-03 16:54:15羊年新春福章
日期:2015-02-26 08:47:552015年亚洲杯之卡塔尔
日期:2015-02-03 08:33:45射手座
日期:2014-12-31 08:36:51水瓶座
日期:2014-06-04 08:33:52天蝎座
日期:2014-05-14 14:30:41天秤座
日期:2014-04-21 08:37:08处女座
日期:2014-04-18 16:57:05戌狗
日期:2014-04-04 12:21:33技术图书徽章
日期:2014-03-25 09:00:29
3 [报告]
发表于 2014-04-11 08:40 |只看该作者
pagecache肯定在内核态,大小不限(新版本内核可以限制),只要有空闲内存,就会尽量将其用作cache,在free命令中可以看到cache的实际用量。
write时,如果数据不拷贝到内核空间,就无法将其写入存储设备了。(当然专用的零拷贝方案除外~)

论坛徽章:
6
金牛座
日期:2013-10-08 10:19:10技术图书徽章
日期:2013-10-14 16:24:09CU十二周年纪念徽章
日期:2013-10-24 15:41:34狮子座
日期:2013-11-24 19:26:19未羊
日期:2014-01-23 15:50:002015年亚洲杯之阿联酋
日期:2015-05-09 14:36:15
4 [报告]
发表于 2014-04-11 15:37 |只看该作者
回复 3# humjb_1983
关于pagecache大小限制的问题一直有不同的声音,所有说patch虽然出来很久了,但一直没有进入mainline。很多厂商自己merge了,比如suse

   

论坛徽章:
15
射手座
日期:2014-02-26 13:45:082015年迎新春徽章
日期:2015-03-04 09:54:452015年辞旧岁徽章
日期:2015-03-03 16:54:15羊年新春福章
日期:2015-02-26 08:47:552015年亚洲杯之卡塔尔
日期:2015-02-03 08:33:45射手座
日期:2014-12-31 08:36:51水瓶座
日期:2014-06-04 08:33:52天蝎座
日期:2014-05-14 14:30:41天秤座
日期:2014-04-21 08:37:08处女座
日期:2014-04-18 16:57:05戌狗
日期:2014-04-04 12:21:33技术图书徽章
日期:2014-03-25 09:00:29
5 [报告]
发表于 2014-04-14 08:43 |只看该作者
瀚海书香 发表于 2014-04-11 15:37
回复 3# humjb_1983
关于pagecache大小限制的问题一直有不同的声音,所有说patch虽然出来很久了,但一直没 ...

呵呵,我就是看到suse中已经有相应的功能了,mainline暂时没有关注,感谢提醒。
目前我们都是通过内存水线来控制pagecache的用量,但很有局限性。

论坛徽章:
6
金牛座
日期:2013-10-08 10:19:10技术图书徽章
日期:2013-10-14 16:24:09CU十二周年纪念徽章
日期:2013-10-24 15:41:34狮子座
日期:2013-11-24 19:26:19未羊
日期:2014-01-23 15:50:002015年亚洲杯之阿联酋
日期:2015-05-09 14:36:15
6 [报告]
发表于 2014-04-14 08:47 |只看该作者
回复 5# humjb_1983
目前我们都是通过内存水线来控制pagecache的用量,但很有局限性。

可以让应用层的开发者关心
posix_fadvise

   

论坛徽章:
15
射手座
日期:2014-02-26 13:45:082015年迎新春徽章
日期:2015-03-04 09:54:452015年辞旧岁徽章
日期:2015-03-03 16:54:15羊年新春福章
日期:2015-02-26 08:47:552015年亚洲杯之卡塔尔
日期:2015-02-03 08:33:45射手座
日期:2014-12-31 08:36:51水瓶座
日期:2014-06-04 08:33:52天蝎座
日期:2014-05-14 14:30:41天秤座
日期:2014-04-21 08:37:08处女座
日期:2014-04-18 16:57:05戌狗
日期:2014-04-04 12:21:33技术图书徽章
日期:2014-03-25 09:00:29
7 [报告]
发表于 2014-04-14 08:59 |只看该作者
感谢,后面需要多关注看看~~

论坛徽章:
15
射手座
日期:2014-02-26 13:45:082015年迎新春徽章
日期:2015-03-04 09:54:452015年辞旧岁徽章
日期:2015-03-03 16:54:15羊年新春福章
日期:2015-02-26 08:47:552015年亚洲杯之卡塔尔
日期:2015-02-03 08:33:45射手座
日期:2014-12-31 08:36:51水瓶座
日期:2014-06-04 08:33:52天蝎座
日期:2014-05-14 14:30:41天秤座
日期:2014-04-21 08:37:08处女座
日期:2014-04-18 16:57:05戌狗
日期:2014-04-04 12:21:33技术图书徽章
日期:2014-03-25 09:00:29
8 [报告]
发表于 2014-05-13 16:22 |只看该作者
瀚海书香 发表于 2014-04-11 15:37
回复 3# humjb_1983
关于pagecache大小限制的问题一直有不同的声音,所有说patch虽然出来很久了,但一直没 ...

请问瀚海兄,这个patch在哪儿可以找到呢?另外,“有不同的声音”具体是因为啥?是不稳定么?是否有相关的链接可以参考?谢谢!

论坛徽章:
6
金牛座
日期:2013-10-08 10:19:10技术图书徽章
日期:2013-10-14 16:24:09CU十二周年纪念徽章
日期:2013-10-24 15:41:34狮子座
日期:2013-11-24 19:26:19未羊
日期:2014-01-23 15:50:002015年亚洲杯之阿联酋
日期:2015-05-09 14:36:15
9 [报告]
发表于 2014-05-14 10:34 |只看该作者
回复 8# humjb_1983
当时貌似是在LWN上看的,刚才找了找没找到。因为这个patch太老了,我的邮件列表里面也没有了。不过当时反对的原因是反对者认为这个选项与linux尽可能使用内存向违背了,反对者认为vm系统应该尽可能使用内存加速文件的访问。
这个patch的稳定性没有问题的,我这边有一个suse下的patch。
  1. From: Markus Guertler <mguertler@novell.com>
  2. Subject: Introduce (optional) pagecache limit
  3. References: FATE309111
  4. Patch-mainline: Never

  5. There are apps that consume lots of memory and touch some of their
  6. pages very infrequently; yet those pages are very important for the
  7. overall performance of the app and should not be paged out in favor
  8. of pagecache. The kernel can't know this and takes the wrong decisions,
  9. even with low swappiness values.

  10. This sysctl allows to set a limit for the non-mapped page cache;
  11. non-mapped meaning that it will not affect shared memory or files
  12. that are mmap()ed -- just anonymous file system cache.
  13. Above this limit, the kernel will always consider removing pages from
  14. the page cache first.

  15. The limit that ends up being enforced is dependent on free memory;
  16. if we have lots of it, the effective limit is much higher -- only when
  17. the free memory gets scarce, we'll become strict about anonymous
  18. page cache. This should make the setting much more attractive to use.

  19. [Reworked by Kurt Garloff and Nick Piggin]

  20. Signed-off-by: Kurt Garloff <garloff@suse.de>
  21. Signed-off-by: Nick Piggin <npiggin@suse.de>
  22. Acked-by: Michal Hocko <mhocko@suse.cz>

  23. Index: linux-3.0-SLE11-SP2-3.0/Documentation/vm/pagecache-limit
  24. ===================================================================
  25. --- /dev/null
  26. +++ linux-3.0-SLE11-SP2-3.0/Documentation/vm/pagecache-limit
  27. @@ -0,0 +1,51 @@
  28. +Functionality:
  29. +-------------
  30. +The patch introduces a new tunable in the proc filesystem:
  31. +
  32. +/proc/sys/vm/pagecache_limit_mb
  33. +
  34. +This tunable sets a limit to the unmapped pages in the pagecache in megabytes.
  35. +If non-zero, it should not be set below 4 (4MB), or the system might behave erratically. In real-life, much larger limits (a few percent of system RAM / a hundred MBs) will be useful.
  36. +
  37. +Examples:
  38. +echo 512 >/proc/sys/vm/pagecache_limit_mb
  39. +
  40. +This sets a baseline limits for the page cache (not the buffer cache!) of 0.5GiB.
  41. +As we only consider pagecache pages that are unmapped, currently mapped pages (files that are mmap'ed such as e.g. binaries and libraries as well as SysV shared memory) are not limited by this.
  42. +NOTE: The real limit depends on the amount of free memory. Every existing free page allows the page cache to grow 8x the amount of free memory above the set baseline. As soon as the free memory is needed, we free up page cache.
  43. +
  44. +
  45. +How it works:
  46. +------------
  47. +The heart of this patch is a new function called shrink_page_cache(). It is called from balance_pgdat (which is the worker for kswapd) if the pagecache is above the limit.
  48. +The function is also called in __alloc_pages_slowpath.
  49. +
  50. +shrink_page_cache() calculates the nr of pages the cache is over its limit. It reduces this number by a factor (so you have to call it several times to get down to the target) then shrinks the pagecache (using the Kernel LRUs).
  51. +
  52. +shrink_page_cache does several passes:
  53. +- Just reclaiming from inactive pagecache memory.
  54. +  This is fast -- but it might not find enough free pages; if that happens,
  55. +  the second pass will happen
  56. +- In the second pass, pages from active list will also be considered.
  57. +- The third pass is just another round of the second pass
  58. +
  59. +In all passes, only unmapped pages will be considered.
  60. +
  61. +
  62. +How it changes memory management:
  63. +--------------------------------
  64. +If the pagecache_limit_mb is set to zero (default), nothing changes.
  65. +
  66. +If set to a positive value, there will be three different operating modes:
  67. +(1) If we still have plenty of free pages, the pagecache limit will NOT be enforced. Memory management decisions are taken as normally.
  68. +(2) However, as soon someone consumes those free pages, we'll start freeing pagecache -- as those are returned to the free page pool, freeing a few pages from pagecache will return us to state (1) -- if however someone consumes these free pages quickly, we'll continue freeing up pages from the pagecache until we reach pagecache_limit_mb.
  69. +(3) Once we are at or below the low watermark, pagecache_limit_mb, the pages in the page cache will be governed by normal paging memory management decisions; if it starts growing above the limit (corrected by the free pages), we'll free some up again.
  70. +
  71. +This feature is useful for machines that have large workloads, carefully sized to eat most of the memory. Depending on the applications page access pattern, the kernel may too easily swap the application memory out in favor of pagecache. This can happen even for low values of swappiness. With this feature, the admin can tell the kernel that only a certain amount of pagecache is really considered useful and that it otherwise should favor the applications memory.
  72. +
  73. +
  74. +Foreground vs. background shrinking:
  75. +-----------------------------------
  76. +
  77. +Usually, the Linux kernel reclaims its memory using the kernel thread kswapd. It reclaims memory in the background. If it can't reclaim memory fast enough, it retries with higher priority and if this still doesn't succeed it uses a direct reclaim path.
  78. +
  79. Index: linux-3.0-SLE11-SP2-3.0/include/linux/pagemap.h
  80. ===================================================================
  81. --- linux-3.0-SLE11-SP2-3.0.orig/include/linux/pagemap.h
  82. +++ linux-3.0-SLE11-SP2-3.0/include/linux/pagemap.h
  83. @@ -12,6 +12,7 @@
  84. #include <asm/uaccess.h>
  85. #include <linux/gfp.h>
  86. #include <linux/bitops.h>
  87. +#include <linux/swap.h>
  88. #include <linux/hardirq.h> /* for in_interrupt() */
  89. #include <linux/hugetlb_inline.h>

  90. Index: linux-3.0-SLE11-SP2-3.0/include/linux/swap.h
  91. ===================================================================
  92. --- linux-3.0-SLE11-SP2-3.0.orig/include/linux/swap.h
  93. +++ linux-3.0-SLE11-SP2-3.0/include/linux/swap.h
  94. @@ -262,6 +262,10 @@ extern unsigned long mem_cgroup_shrink_n
  95. extern int __isolate_lru_page(struct page *page, isolate_mode_t mode, int file);
  96. extern unsigned long shrink_all_memory(unsigned long nr_pages);
  97. extern int vm_swappiness;
  98. +#define FREE_TO_PAGECACHE_RATIO 8
  99. +extern unsigned long pagecache_over_limit(void);
  100. +extern void shrink_page_cache(gfp_t mask, struct page *page);
  101. +extern unsigned int vm_pagecache_limit_mb;
  102. extern int remove_mapping(struct address_space *mapping, struct page *page);
  103. extern long vm_total_pages;

  104. Index: linux-3.0-SLE11-SP2-3.0/kernel/sysctl.c
  105. ===================================================================
  106. --- linux-3.0-SLE11-SP2-3.0.orig/kernel/sysctl.c
  107. +++ linux-3.0-SLE11-SP2-3.0/kernel/sysctl.c
  108. @@ -1126,6 +1126,13 @@ static struct ctl_table vm_table[] = {
  109.                 .extra1                = &zero,
  110.                 .extra2                = &one_hundred,
  111.         },
  112. +        {
  113. +                .procname        = "pagecache_limit_mb",
  114. +                .data                = &vm_pagecache_limit_mb,
  115. +                .maxlen                = sizeof(vm_pagecache_limit_mb),
  116. +                .mode                = 0644,
  117. +                .proc_handler        = &proc_dointvec,
  118. +        },
  119. #ifdef CONFIG_HUGETLB_PAGE
  120.         {
  121.                 .procname        = "nr_hugepages",
  122. Index: linux-3.0-SLE11-SP2-3.0/mm/filemap.c
  123. ===================================================================
  124. --- linux-3.0-SLE11-SP2-3.0.orig/mm/filemap.c
  125. +++ linux-3.0-SLE11-SP2-3.0/mm/filemap.c
  126. @@ -507,6 +507,9 @@ int add_to_page_cache(struct page *page,
  127. {
  128.         int error;

  129. +        if (unlikely(vm_pagecache_limit_mb) && pagecache_over_limit() > 0)
  130. +                shrink_page_cache(gfp_mask, page);
  131. +
  132.         __set_page_locked(page);
  133.         error = add_to_page_cache_locked(page, mapping, offset, gfp_mask);
  134.         if (unlikely(error))
  135. Index: linux-3.0-SLE11-SP2-3.0/mm/page_alloc.c
  136. ===================================================================
  137. --- linux-3.0-SLE11-SP2-3.0.orig/mm/page_alloc.c
  138. +++ linux-3.0-SLE11-SP2-3.0/mm/page_alloc.c
  139. @@ -5604,6 +5604,25 @@ out:
  140.         spin_unlock_irqrestore(&zone->lock, flags);
  141. }

  142. +/* Returns a number that's positive if the pagecache is above
  143. + * the set limit. Note that we allow the pagecache to grow
  144. + * larger if there's plenty of free pages.
  145. + */
  146. +unsigned long pagecache_over_limit()
  147. +{
  148. +        /* We only want to limit unmapped page cache pages */
  149. +        unsigned long pgcache_pages = global_page_state(NR_FILE_PAGES)
  150. +                                    - global_page_state(NR_FILE_MAPPED);
  151. +        unsigned long free_pages = global_page_state(NR_FREE_PAGES);
  152. +        unsigned long limit;
  153. +
  154. +        limit = vm_pagecache_limit_mb * ((1024*1024UL)/PAGE_SIZE) +
  155. +                FREE_TO_PAGECACHE_RATIO * free_pages;
  156. +        if (pgcache_pages > limit)
  157. +                return pgcache_pages - limit;
  158. +        return 0;
  159. +}
  160. +
  161. #ifdef CONFIG_MEMORY_HOTREMOVE
  162. /*
  163.   * All pages in the range must be isolated before calling this.
  164. Index: linux-3.0-SLE11-SP2-3.0/mm/shmem.c
  165. ===================================================================
  166. --- linux-3.0-SLE11-SP2-3.0.orig/mm/shmem.c
  167. +++ linux-3.0-SLE11-SP2-3.0/mm/shmem.c
  168. @@ -1036,6 +1036,10 @@ uncharge:
  169.                 mem_cgroup_uncharge_cache_page(page);
  170.         if (found < 0)
  171.                 error = found;
  172. +        else if (found > 0) {
  173. +                if (unlikely(vm_pagecache_limit_mb) && pagecache_over_limit() > 0)
  174. +                        shrink_page_cache(GFP_KERNEL, page);
  175. +        }
  176. out:
  177.         unlock_page(page);
  178.         page_cache_release(page);
  179. Index: linux-3.0-SLE11-SP2-3.0/mm/vmscan.c
  180. ===================================================================
  181. --- linux-3.0-SLE11-SP2-3.0.orig/mm/vmscan.c
  182. +++ linux-3.0-SLE11-SP2-3.0/mm/vmscan.c
  183. @@ -148,8 +148,9 @@ struct scan_control {
  184. /*
  185.   * From 0 .. 100.  Higher means more swappy.
  186.   */
  187. -int vm_swappiness = 60;
  188. -long vm_total_pages;        /* The total number of pages which the VM controls */
  189. +int vm_swappiness __read_mostly = 60;
  190. +unsigned int vm_pagecache_limit_mb __read_mostly = 0;
  191. +long vm_total_pages __read_mostly;        /* The total number of pages which the VM controls */

  192. static LIST_HEAD(shrinker_list);
  193. static DECLARE_RWSEM(shrinker_rwsem);
  194. @@ -2363,6 +2364,8 @@ static bool sleeping_prematurely(pg_data
  195.                 return !all_zones_ok;
  196. }

  197. +static void __shrink_page_cache(gfp_t mask);
  198. +
  199. /*
  200.   * For kswapd, balance_pgdat() will work across all this node's zones until
  201.   * they are all at high_wmark_pages(zone).
  202. @@ -2418,6 +2421,10 @@ loop_again:
  203.         sc.may_writepage = !laptop_mode;
  204.         count_vm_event(PAGEOUTRUN);

  205. +        /* this reclaims from all zones so don't count to sc.nr_reclaimed */
  206. +        if (unlikely(vm_pagecache_limit_mb) && pagecache_over_limit() > 0)
  207. +                __shrink_page_cache(GFP_KERNEL);
  208. +
  209.         for (priority = DEF_PRIORITY; priority >= 0; priority--) {
  210.                 unsigned long lru_pages = 0;
  211.                 int has_under_min_watermark_zone = 0;
  212. @@ -2587,6 +2594,12 @@ loop_again:
  213.         }
  214. out:

  215. +        /* We do not need to loop_again if we have not achieved our
  216. +         * pagecache target (i.e. && pagecache_over_limit(0) > 0) because
  217. +         * the limit will be checked next time a page is added to the page
  218. +         * cache. This might cause a short stall but we should rather not
  219. +         * keep kswapd awake.
  220. +         */
  221.         /*
  222.          * order-0: All zones must meet high watermark for a balanced node
  223.          * high-order: Balanced zones must make up at least 25% of the node
  224. @@ -2900,6 +2913,160 @@ unsigned long shrink_all_memory(unsigned
  225. }
  226. #endif /* CONFIG_HIBERNATION */

  227. +
  228. +/*
  229. + * We had to resurect this function for __shrink_page_cache (upstream has
  230. + * removed it and reworked shrink_all_memory by 7b51755c).
  231. + *
  232. + * Tries to reclaim 'nr_pages' pages from LRU lists system-wide, for given
  233. + * pass and priority.
  234. + *
  235. + * For pass > 3 we also try to shrink the LRU lists that contain a few pages
  236. + */
  237. +static void shrink_all_zones(unsigned long nr_pages, int prio,
  238. +                                      int pass, struct scan_control *sc)
  239. +{
  240. +        struct zone *zone;
  241. +        unsigned long nr_reclaimed = 0;
  242. +
  243. +        for_each_populated_zone(zone) {
  244. +                enum lru_list l;
  245. +
  246. +                if (zone->all_unreclaimable && prio != DEF_PRIORITY)
  247. +                        continue;
  248. +
  249. +                for_each_evictable_lru(l) {
  250. +                        enum zone_stat_item ls = NR_LRU_BASE + l;
  251. +                        unsigned long lru_pages = zone_page_state(zone, ls);
  252. +
  253. +                        /* For pass = 0, we don't shrink the active list */
  254. +                        if (pass == 0 && (l == LRU_ACTIVE_ANON ||
  255. +                                                l == LRU_ACTIVE_FILE))
  256. +                                continue;
  257. +
  258. +                        /* Original code relied on nr_saved_scan which is no
  259. +                         * longer present so we are just considering LRU pages.
  260. +                         * This means that the zone has to have quite large
  261. +                         * LRU list for default priority and minimum nr_pages
  262. +                         * size (8*SWAP_CLUSTER_MAX). In the end we will tend
  263. +                         * to reclaim more from large zones wrt. small.
  264. +                         * This should be OK because shrink_page_cache is called
  265. +                         * when we are getting to short memory condition so
  266. +                         * LRUs tend to be large.
  267. +                         */
  268. +                        if (((lru_pages >> prio) + 1) >= nr_pages || pass > 3) {
  269. +                                unsigned long nr_to_scan;
  270. +
  271. +                                nr_to_scan = min(nr_pages, lru_pages);
  272. +                                /* shrink_list takes lru_lock with IRQ off so we
  273. +                                 * should be careful about really huge nr_to_scan
  274. +                                 */
  275. +                                nr_reclaimed += shrink_list(l, nr_to_scan, zone,
  276. +                                                                sc, prio);
  277. +                                if (nr_reclaimed >= nr_pages) {
  278. +                                        sc->nr_reclaimed += nr_reclaimed;
  279. +                                        return;
  280. +                                }
  281. +                        }
  282. +                }
  283. +        }
  284. +        sc->nr_reclaimed += nr_reclaimed;
  285. +}
  286. +
  287. +/*
  288. + * Function to shrink the page cache
  289. + *
  290. + * This function calculates the number of pages (nr_pages) the page
  291. + * cache is over its limit and shrinks the page cache accordingly.
  292. + *
  293. + * The maximum number of pages, the page cache shrinks in one call of
  294. + * this function is limited to SWAP_CLUSTER_MAX pages. Therefore it may
  295. + * require a number of calls to actually reach the vm_pagecache_limit_kb.
  296. + *
  297. + * This function is similar to shrink_all_memory, except that it may never
  298. + * swap out mapped pages and only does two passes.
  299. + */
  300. +static void __shrink_page_cache(gfp_t mask)
  301. +{
  302. +        unsigned long ret = 0;
  303. +        int pass;
  304. +        struct reclaim_state reclaim_state;
  305. +        struct scan_control sc = {
  306. +                .gfp_mask = mask,
  307. +                .may_swap = 0,
  308. +                .may_unmap = 0,
  309. +                .may_writepage = 0,
  310. +                .swappiness = vm_swappiness,
  311. +        };
  312. +        struct shrink_control shrink = {
  313. +                .gfp_mask = mask,
  314. +        };
  315. +        struct reclaim_state *old_rs = current->reclaim_state;
  316. +        long nr_pages;
  317. +
  318. +        /* How many pages are we over the limit?
  319. +         * But don't enforce limit if there's plenty of free mem */
  320. +        nr_pages = pagecache_over_limit();
  321. +
  322. +        /* Don't need to go there in one step; as the freed
  323. +         * pages are counted FREE_TO_PAGECACHE_RATIO times, this
  324. +         * is still more than minimally needed. */
  325. +        nr_pages /= 2;
  326. +
  327. +        /* Return early if there's no work to do */
  328. +        if (nr_pages <= 0)
  329. +                return;
  330. +        /* But do a few at least */
  331. +        nr_pages = max_t(unsigned long, nr_pages, 8*SWAP_CLUSTER_MAX);
  332. +
  333. +        current->reclaim_state = &reclaim_state;
  334. +
  335. +        /*
  336. +         * Shrink the LRU in 2 passes:
  337. +         * 0 = Reclaim from inactive_list only (fast)
  338. +         * 1 = Reclaim from active list but don't reclaim mapped (not that fast)
  339. +         * 2 = Reclaim from active list but don't reclaim mapped (2nd pass)
  340. +         */
  341. +        for (pass = 0; pass < 2; pass++) {
  342. +                int prio;
  343. +
  344. +                for (prio = DEF_PRIORITY; prio >= 0; prio--) {
  345. +                        unsigned long nr_to_scan = nr_pages - ret;
  346. +
  347. +                        sc.nr_scanned = 0;
  348. +                        /* sc.swap_cluster_max = nr_to_scan; */
  349. +                        shrink_all_zones(nr_to_scan, prio, pass, &sc);
  350. +                        ret += sc.nr_reclaimed;
  351. +                        if (ret >= nr_pages)
  352. +                                goto out;
  353. +
  354. +                        reclaim_state.reclaimed_slab = 0;
  355. +                        shrink_slab(&shrink, sc.nr_scanned,
  356. +                                        global_reclaimable_pages());
  357. +                        ret += reclaim_state.reclaimed_slab;
  358. +
  359. +                        if (ret >= nr_pages)
  360. +                                goto out;
  361. +
  362. +                }
  363. +        }
  364. +
  365. +out:
  366. +        current->reclaim_state = old_rs;
  367. +}
  368. +
  369. +void shrink_page_cache(gfp_t mask, struct page *page)
  370. +{
  371. +        /* FIXME: As we only want to get rid of non-mapped pagecache
  372. +         * pages and we know we have too many of them, we should not
  373. +         * need kswapd. */
  374. +        /*
  375. +        wakeup_kswapd(page_zone(page), 0);
  376. +        */
  377. +
  378. +        __shrink_page_cache(mask);
  379. +}
  380. +
  381. /* It's optimal to keep kswapds on the same CPUs as their memory, but
  382.     not required for correctness.  So if the last cpu in a node goes
  383.     away, we get changed to run anywhere: as the first one comes back,
复制代码

论坛徽章:
15
射手座
日期:2014-02-26 13:45:082015年迎新春徽章
日期:2015-03-04 09:54:452015年辞旧岁徽章
日期:2015-03-03 16:54:15羊年新春福章
日期:2015-02-26 08:47:552015年亚洲杯之卡塔尔
日期:2015-02-03 08:33:45射手座
日期:2014-12-31 08:36:51水瓶座
日期:2014-06-04 08:33:52天蝎座
日期:2014-05-14 14:30:41天秤座
日期:2014-04-21 08:37:08处女座
日期:2014-04-18 16:57:05戌狗
日期:2014-04-04 12:21:33技术图书徽章
日期:2014-03-25 09:00:29
10 [报告]
发表于 2014-05-14 11:31 |只看该作者
本帖最后由 humjb_1983 于 2014-05-14 14:32 编辑
瀚海书香 发表于 2014-05-14 10:34
回复 8# humjb_1983
当时貌似是在LWN上看的,刚才找了找没找到。因为这个patch太老了,我的邮件列表里面也 ...

感谢瀚海兄支持~
从我们的使用经历看,page cache目前的机制(不做限制)在某些应用场景中确实有问题的,比如:
1、当内核中使用ATOMIC标记分配内存时,不会回收cache,此时如果free内存不足,则分配不到需要的内存,而此时系统中的cached实际是很多的,这种情况很容易出现,就是因为cache不做限制。
2、当业务对内存分配的及时性有要求时,cache不做限制的话,经常会出现时间需要分配内存时才做cache回收的情况,而且是同步回收,会增加内存分配的延时,直接影响业务性能。而suse的补丁中的cache应该都是异步回收的,应该不存在这个问题。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP