论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2010-01-24 21:50 |只看该作者 |倒序浏览

组调度
先说一下提纲
1 关于group schedule的文档
2 关于组调度的认识
3 一些与组调度相关的数组结构
4 组调度结构图
5 具体的操作函数
6 后记
1 按照文档的解释，完全公平调度并不仅仅针对单一进程，也应该对组调度。例如有两个用户wxc1,wxc2，以用户userid来分，调度器分配给两个组的时间是公平的,wxc1 and wxc2各有50%的CPU，，组内也实现公平调度。
几个宏：
CONFIG_GROUP_SCHED ：打开组调度
CONFIG_RT_GROUP_SCHED 实时进程组调度
CONFIG_FAIR_GROUP_SCHED 普通进程。
组的划分：
1）Based on user id (CONFIG_USER_SCHED)
2）Based on "cgroup" pseudo filesystem (CONFIG_CGROUP_SCHED)
后面这一种还不太熟悉，参照eric xiao 网友的blog,在对cgroup文件系统进程操作时，会去调用一个接口，里面的函数完成对组调度的操作。
cgroup这部分后面再学习，暂时还不了解。
2 关于组调度的认识
1）组调度是分层次的。
2）一个组里面有数个进程，这些进程的调度作为一层。
3）所有的组都是连在一起的，他们之前有父子兄弟关系。为首的是root-task-group
4）另一层是组调度，即把一个组作为一个调度实体，仍然用schedule_entity这个结构体来表示。
5）一个task_group在每个cpu上都关联着一个cfs_rq,一个schedule_entity，注意，这两个东东是与之前单个进程调度里提到的cfs_rq & schedule_entity有分别的。。它们代表得是组。跟之前单个进程不一样。
6）每个cpu的run queue里有个链表，把所有在这个cpu上属于某些组的cfs_rq链在了一起。
7）组与组之间的父子关系，同时代表着组在某个cpu上的调度实体之间的父子关系。假如说wxc1是wxc2组的parent,那个wxc1 在cpu0上的调度实体schedu_entity 是wxc2在cpu0上调度实体的父亲，这个一会在结构图中可以看到。
8）每个调度实体都会指向一个它它所属的cfs_rq,同时还有一个它所在的cfs_rq.对于代表组的调度实体来说，它的cfs_rq是其parent所在组的cfs_rq,而它自己的cfs_rq,用my_q指向，是它所在的组的cfs_rq
3 一些与组调度相关的数组结构
1) 结构体组：
/* task group related information */
struct task_group {
#ifdef CONFIG_CGROUP_SCHED
struct cgroup_subsys_state css;
#endif
#ifdef CONFIG_FAIR_GROUP_SCHED
/* schedulable entities of this group on each cpu */
struct sched_entity **se;每个cpu上都有一个调度实体，代表这个组，不是单个task的调度实体
/* runqueue "owned" by this group on each cpu */
struct cfs_rq **cfs_rq;每个cpu上都有一个cfs_rq,属于这个组的
unsigned long shares;
#endif
#ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity **rt_se;
struct rt_rq **rt_rq;
struct rt_bandwidth rt_bandwidth;
#endif
struct rcu_head rcu;
struct list_head list;
struct task_group *parent;  这三个代表着组与组之间的亲戚
struct list_head siblings;
struct list_head children;
};
2） struct rq{
...
struct list_head leaf_cfs_rq_list; 用来链接在此cpu上所有组的cfs_rq
...
}
3)
struct cfs_rq {
...
#ifdef CONFIG_FAIR_GROUP_SCHED
//Note here: cfs has rq pointer when group schedule
struct rq *rq; /* cpu runqueue to which this cfs_rq is attached */ 指向一个run queue
/*
   * leaf cfs_rqs are those that hold tasks (lowest schedulable entity in
   * a hierarchy). Non-leaf lrqs hold other higher schedulable entities
   * (like users, containers etc.)
   *
   * leaf_cfs_rq_list ties together list of leaf cfs_rq's in a cpu. This
   * list is used during load balance.
   */
struct list_head leaf_cfs_rq_list;          链表，把一个runqueue上的所有cfs_rq链接在一起
struct task_group *tg; /* group that "owns" this runqueue */ 哪个group来的？
...
}
4） struct sched_entity {
。。。
#ifdef CONFIG_FAIR_GROUP_SCHED
struct sched_entity *parent; 也有父亲了~以前是野孩子其父亲一般是其所在group的父亲的在同样cpu上的调度实体
/* rq on which this entity is (to be) queued: */
struct cfs_rq       *cfs_rq;  这两个cfs_rq挺绕的：cfs_rq这个指针指向的是parent->my_q
/* rq "owned" by this entity/group: */
struct cfs_rq       *my_q;自己的cfs_rq
#endif
};
4 组调度结构图
请参照附件，手工画得，因为不会用做图软件汗~
右上角部分：两个组之前是父子关系  每个组旁边的三角形是代表组的调度实体
5 具体的操作函数
先看两个组：
1） /* Default task group.
* Every task in system belong to this group at bootup.
*/
struct task_group init_task_group;
2) /*
* Root task group.
*    Every UID task group (including init_task_group aka UID-0) will
*    be a child to this group.
*/
init_task_group是系统初始化时进程的所在组。
root_task_group 是所有group 的祖宗。
在sched_init()里面，有对它的初始化过程。
sched_init(){
.....
#ifdef CONFIG_FAIR_GROUP_SCHED
      init_task_group.se = (struct sched_entity **)ptr;
      ptr += nr_cpu_ids * sizeof(void **);请留意这儿，  ptr加一个偏移，这段空间内存放着在所有cpu上的代表组的调度实体的指针。
      init_task_group.cfs_rq = (struct cfs_rq **)ptr;
      ptr += nr_cpu_ids * sizeof(void **);
#ifdef CONFIG_USER_SCHED
      root_task_group.se = (struct sched_entity **)ptr;
      ptr += nr_cpu_ids * sizeof(void **);
      root_task_group.cfs_rq = (struct cfs_rq **)ptr;
      ptr += nr_cpu_ids * sizeof(void **);
#endif /* CONFIG_USER_SCHED */
...
}
由于page_alloc还没有建立，所以用alloc_bootmem()，上面的过程建立了一个task-group里面的**se,**cfs_rq部分。。。即两个指针数组。
留意这句：
init_task_group.parent = &root_task_group;
可见init_task_group的父亲也是root_task_group.
接着往下，有一个对每个cpu的注册过程，这个过程和一个组创建并注册的过程类似，简单分析下。
这个过程只简我感兴趣的分析：
for_each_possible_cpu(i) {
      struct rq *rq;
      rq = cpu_rq(i);
      spin_lock_init(&rq->lock);
      rq->nr_running = 0;
      init_cfs_rq(&rq->cfs, rq);初始化运行列队自己的cfs_rq，不是组的哦~
      init_rt_rq(&rq->rt, rq);
#ifdef CONFIG_FAIR_GROUP_SCHED
      init_task_group.shares = init_task_group_load;
      INIT_LIST_HEAD(&rq->leaf_cfs_rq_list);
#ifdef CONFIG_CGROUP_SCHED
。。。
      init_tg_cfs_entry(&init_task_group, &rq->cfs, NULL, i, 1, NULL); 这个函数很重要。。。把initi_task_group,和运行列队的cfs_rq绑定。
#elif defined CONFIG_USER_SCHED
      root_task_group.shares = NICE_0_LOAD;
      init_tg_cfs_entry(&root_task_group, &rq->cfs, NULL, i, 0, NULL);同理。
。。。
init_tg_cfs_entry(&init_task_group,
            &per_cpu(init_cfs_rq, i),
            &per_cpu(init_sched_entity, i), i, 1,
            root_task_group.se);
。。
}
init_tg_cfs_entry()这个函数做了许多好事，提出表扬。
急不可奈地先看看这厮到底做了啥捏：
#ifdef CONFIG_FAIR_GROUP_SCHED
static void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
            struct sched_entity *se, int cpu, int add,
            struct sched_entity *parent) 这几个参数分别是：要初始化的组 + 要初始化的cfs_rq + 要初始化的se + CPU + 是否把cfs挂在rq上 + 代表组的父亲的调度实体
{
struct rq *rq = cpu_rq(cpu);
tg->cfs_rq[cpu] = cfs_rq;先把指针真满
init_cfs_rq(cfs_rq, rq);把它和运行队列关链一下。下面我们会分析这个初始化做什么事情
cfs_rq->tg = tg;从这两句我感觉到，指针真是个好东西啊：随意抽像出来一个玩意儿，就可以做为一个接口，把两个东西绑在一起，让他们有关系。。。这里它就绑定了组和rq
if (add)
      list_add(&cfs_rq->leaf_cfs_rq_list, &rq->leaf_cfs_rq_list);
tg->se[cpu] = se;也把se放进指针数组
/* se could be NULL for init_task_group */
if (!se)
      return;
if (!parent) 这儿有意思：没有父亲的话，se指向的cfs_rq就是当前运行队列的cfs_rq,这个条件在初始化“根组”和初”始化组“时成立。
      se->cfs_rq = &rq->cfs;
else
      se->cfs_rq = parent->my_q;  否则呢，就指向父亲所拥有的cfs_rq
se->my_q = cfs_rq;然后把自己所在组的cfs_rq作为自己的队列
se->load.weight = tg->shares; 共享组的load值。
se->load.inv_weight = 0;
se->parent = parent;  亲吻下自己的父母
}
#endif
看看上面那个如何把cfs_rq和运行队列关联？
static void init_cfs_rq(struct cfs_rq *cfs_rq, struct rq *rq)
{
cfs_rq->tasks_timeline = RB_ROOT;  初始化下自己的红黑结点
INIT_LIST_HEAD(&cfs_rq->tasks);
#ifdef CONFIG_FAIR_GROUP_SCHED
cfs_rq->rq = rq;  我能找到你，你也能找到我，这样才是稳固的关系。。。。
#endif
cfs_rq->min_vruntime = (u64)(-(1LL list, &task_groups);
WARN_ON(!parent); /* root should already exist */
tg->parent = parent;  再亲吻一下自己的父母
INIT_LIST_HEAD(&tg->children);
list_add_rcu(&tg->siblings, &parent->children);
spin_unlock_irqrestore(&task_group_lock, flags);
return tg;
err:
free_sched_group(tg);
return ERR_PTR(-ENOMEM);
}
这个流程不审很清晰的。。。。申请空间-->申请指针数组-->向cpu的运行队列注册
看看这两个函数做什么。
static
int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se, *parent_se;
struct rq *rq;
int i;
tg->cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL);
if (!tg->cfs_rq)
      goto err;
tg->se = kzalloc(sizeof(se) * nr_cpu_ids, GFP_KERNEL);
if (!tg->se)
      goto err;
上面两步不难，申请指向每个cpu的cfs_rq 和 se数组
tg->shares = NICE_0_LOAD;
for_each_possible_cpu(i) {  向每一个cpu注册下申请的cfs_rq 和 se结构体。
      rq = cpu_rq(i);
      cfs_rq = kmalloc_node(sizeof(struct cfs_rq),
            GFP_KERNEL|__GFP_ZERO, cpu_to_node(i));
      if (!cfs_rq)
         goto err;
      se = kmalloc_node(sizeof(struct sched_entity),
            GFP_KERNEL|__GFP_ZERO, cpu_to_node(i));
      if (!se)
         goto err;
      parent_se = parent ? parent->se : NULL; 判断当前组有父亲么？有的话其父亲在此cpu上的调度实体即为它的调度实体的父亲。。真拗口
      init_tg_cfs_entry(tg, cfs_rq, se, i, 0, parent_se);这个是之前我们分析过的，注意其参数
}
return 1;
err:
return 0;
}
上面这个申请函数过程还是蛮顺利的。
下面看看这个注册更简单，它就是把申请到的cfs_rq挂到运行队列的链表上。。。
这样一个组的创建过程分析完了。
最后再分析一个进程在不同组之间移动的情况：
/* change task's runqueue when it moves between groups.
* The caller of this function should have put the task in its new group
* by now. This function just updates tsk->se.cfs_rq and tsk->se.parent to
* reflect its new group.
*/
void sched_move_task(struct task_struct *tsk)
{
int on_rq, running;
unsigned long flags;
struct rq *rq;
rq = task_rq_lock(tsk, &flags); 如果是单进程调度，返回的是运行队列的cfs_rq结构体，否则返回调度实体指向的cfs_rq
update_rq_clock(rq);
running = task_current(rq, tsk);
on_rq = tsk->se.on_rq;
if (on_rq)
      dequeue_task(rq, tsk, 0);  如果还在列上，出列吧，要移走了嘛
if (unlikely(running))不希望它还在运行，如果是的话，就让它停止？或者重新入列？
      tsk->sched_class->put_prev_task(rq, ts）
set_task_rq(tsk, task_cpu(tsk));  设置新的运行队列  下面有分析
#ifdef CONFIG_FAIR_GROUP_SCHED
if (tsk->sched_class->moved_group)
      tsk->sched_class->moved_group(tsk); 下面有分析这个函数
#endif
if (unlikely(running))
      tsk->sched_class->set_curr_task(rq);  试着让它运行。。。这个函数的逻辑也没看懂
if (on_rq)
      enqueue_task(rq, tsk, 0);  正式入列下面分析它
task_rq_unlock(rq, &flags);
}
#endif
这个流程的逻辑我是比较晕的，我搞不清楚为什么会这么做？
当一个进程从一个组移到另一个组的时候，这些操作具体的逻辑我不太懂，，需要再看下组调度理论性的东西。
set_task_rq将重新设置一个task的运行列队：
/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
static inline void set_task_rq(struct task_struct *p, unsigned int cpu) 这儿设置的可是一个进程哦，一个进程的调度实体。。。与代表组的调度实体要分清楚
{
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
p->se.parent = task_group(p)->se[cpu];
#endif
#ifdef CONFIG_RT_GROUP_SCHED
p->rt.rt_rq  = task_group(p)->rt_rq[cpu];
p->rt.parent = task_group(p)->rt_se[cpu];
#endif
}
这样我们大体有个概念了。。。在组高度中，一个组内的进程的父亲都是代表这个组的调度实体。它们指向的cfs_rq就是当前组所在这个cpu上拥有的cfs_rq
下面这个函数守成了进程移到新的组的珠一些信息的更新
#ifdef CONFIG_FAIR_GROUP_SCHED
static void moved_group_fair(struct task_struct *p)
{
struct cfs_rq *cfs_rq = task_cfs_rq(p);
update_curr(cfs_rq);
place_entity(cfs_rq, &p->se, 1);
}
#endif
最后我们看这个入列的函数吧：
static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
for_each_sched_entity(se) {这个for还是很有趣
      if (se->on_rq)
         break;
      cfs_rq = cfs_rq_of(se);  取得其cfs_rq  非组调度时，返回当前运行队列的cfs_rq,组调度时，返回的是se将要调度到的cfs_rq
      enqueue_entity(cfs_rq, se, wakeup);  入列了  可能要初始化时间啊之类的
      wakeup = 1;
}
hrtick_update(rq);
}
看看for循环：
#define for_each_sched_entity(se) \
      for (; se; se = se->parent)
这个在入列出列时经常遇到。如果打开了组调度，就是上面这样。
举例，组A里的一个进程a要加入到组B在运行列队cpu0上的cfs_rq。这时进程的se->cfs_rq应该指向的时在cpu0上的组B的cfs_rq。
首先执行一些出队入队操作。当这个环节完了后，还在对组A这个调度实体进行更新，这样往上遍历，把所有组A的父亲们遍历一次，都执行了入列操作。
为什么这么做呢？  考虑中~
后记
上面这些是我看组调度中想到的一部分。。。。后面看完cgroup调度，应该会更有意思的。
                                 (end)

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u3/110888/showart_2159270.html

文库|博客

返回列表

Chinaunix › 论坛 › 操作系统 › Linux新手园地 › Linux文档专区 › 完全公平调度器(cfs)系列文章(三)

完全公平调度器(cfs)系列文章(三) [复制链接]