_nosay 发表于 2016-02-18 10:36

分配策略数组大小疑问?

本帖最后由 _nosay 于 2016-02-18 11:33 编辑

《Linux内核源代码情景分析》第50页:


我最开始看到这句话的时候,产生了一个疑问,就是代表1个CPU结点的pglist_data结构,最多只有3个zone,怎么也拼不出来256种组合呀,数组定义这么大不是浪费么?
后来我“想通了”,这256个指针不一定非要指向自己的zone,也可以指向别的pglist_data结构的zone。
但是看了build_zonelists()函数,我又怀疑自己了,这256个指针都是指向pgdat的3个zone,而且有好多重复的(其实就“highmem→normal→dma→null”、“normal→dma→null”、“dma→null” 3种组合),所以又不明白node_zonelists[]数组为什么要定义成256大小了。
linux-2.4.0, mm/page_alloc.c, 705~752
/*
* Builds allocation fallback zone lists.
*/
static inline void build_zonelists(pg_data_t *pgdat)
{
        int i, j, k;

        for (i = 0; i < NR_GFPINDEX; i++) {
                zonelist_t *zonelist;
                zone_t *zone;

                zonelist = pgdat->node_zonelists + i;
                memset(zonelist, 0, sizeof(*zonelist));

                zonelist->gfp_mask = i;
                j = 0;
                k = ZONE_NORMAL;
                if (i & __GFP_HIGHMEM)
                        k = ZONE_HIGHMEM;
                if (i & __GFP_DMA)
                        k = ZONE_DMA;

                switch (k) {
                        default:
                                BUG();
                        /*
                       * fallthrough:
                       */
                        case ZONE_HIGHMEM:
                                zone = pgdat->node_zones + ZONE_HIGHMEM;
                                if (zone->size) {
#ifndef CONFIG_HIGHMEM
                                        BUG();
#endif
                                        zonelist->zones = zone;
                                }
                        case ZONE_NORMAL:
                                zone = pgdat->node_zones + ZONE_NORMAL;
                                if (zone->size)
                                        zonelist->zones = zone;
                        case ZONE_DMA:
                                zone = pgdat->node_zones + ZONE_DMA;
                                if (zone->size)
                                        zonelist->zones = zone;
                }
                zonelist->zones = NULL;
        }
}
build_zonelists()函数模拟

#include <stdio.h>

int main()
{
      int i, k;

      for (i = 0; i < 256; i ++)
      {
                k = 1;
                if (i & 0x10)
                        k = 2;
                if (i & 0x08)
                        k = 0;

                printf("zonelist[%d]: ", i);

                switch (k)
                {
                case 2:
                        printf("highmem ");
                case 1:
                        printf("normal ");
                case 0:
                        printf("dma ");
                default:
                        break;
                }

                printf("\n");
      }

      return 0;
}

打印结果:
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: highmem normal dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma
zonelist: dma

_nosay 发表于 2016-02-18 15:19

本帖最后由 _nosay 于 2016-02-18 15:20 编辑

分配页面时,由gfp_mask表示使用哪种分配策略,gfp_mask的意义不仅仅是策略编号(即node_zonelists[]数组的下标),它的意义还在于它的一些二进制位上,策略编号确定的情况下,由于其它二进制位的不同,就会造成这个值整体上发生变化,从而相同的策略要在不同的位置都放一份。static inline struct page * alloc_pages(int gfp_mask, unsigned long order)
{
        /*
       * Gets optimized away by the compiler.
       */
        if (order >= MAX_ORDER)
                return NULL;
        return __alloc_pages(contig_page_data.node_zonelists+(gfp_mask), order);
}

Buddy_Zhang1 发表于 2016-02-19 09:08

HI:
   楼主,我觉得你想弄懂这个问题,请先更新你的内核版本,新版本中,定义如下:
struct pglist_data {
        struct zone node_zones;
        struct zonelist node_zonelists;
        int nr_zones;
}

#define MAX_NR_ZONES根据系统中实际使用的区域数,一般为 2,其中包括 Normal 和 Highmem.
#define MAX_ZONELISTS 为系统所使用的节点数,在 UMA 系统中,一般为 1.

内核在使用 build_zonelists() 的时候会根据自身所具有的内存区域采取分配策略,内核会根据 GFP_* 标志从 Normal 或 Highmem 中分配内存.

_nosay 发表于 2016-02-19 10:14

Buddy_Zhang1 发表于 2016-02-19 09:08 static/image/common/back.gif
HI:
   楼主,我觉得你想弄懂这个问题,请先更新你的内核版本,新版本中,定义如下:
struct pglist_data {
...

其实就是二楼说的那样,分配策略gfp_mask不光指定了zone的使用顺序:
① 通过__GFP_DMA、__GFP_HIGHMEM位,判断使用“highmem→normal→dma→null”、“normal→dma→null”、“dma→null”中的哪个顺序;
(为什么没有__GFP_NORMAL?因为默认就是它。)
② __GFP_WAIT、__GFP_IO等其它位,对策略的描述。

现在假设对zone的使用顺序要求为__GFP_DMA(0x08),则gfp_mask低8位必须为xxxxx1xx,所以要保证00000100可以选到“dma→null”,也要保证0000101、0000110等,也能选到“dma→null”,所以就出现了模拟程序打印的那样,这样感觉不好,首先是浪费内存,另外是如果将来gfp_mask有意义的位超过了8位,node_zonelists[]数组的大小也要跟着变大才行。

个人理解,如果不对请帮忙纠正。
页: [1]
查看完整版本: 分配策略数组大小疑问?