Chinaunix

标题: Anatomizing Linux scheduler: CFS 2 [打印本页]

作者: superfluous 时间: 2009-08-26 10:36
标题: Anatomizing Linux scheduler: CFS 2
Linux uses sched_entity to represent a scheduling entity, which is contained by task_struct. So when a sched_entity is used, we can easily locate the corresponding task.
struct task_struct  {
   ...
   struct sched_entity se;
   ...
};
When a pointer to scheduling entity se, the task p is obtained:
p = container_of(se, struct task_struct, se);
Now how cfs_rq.fair_clock and se.wait_runtime to calculate? Firstly let us talk about the design principle.
A task is in run queue or in sleep queue. When it is ready, it is in run queue. This implies that the task is ready to run. It may not execute. When it wants some unavailable resources(e.g. reading data from disk), it will dequeue from run queue and enqueue to sleep queue. The CfS scheduler is going to garantee the above-mentioned fairness in the runqueues. Now let's assume that there are 3 tasks(A, B, C) in the runqueue. Also we assume that the time interval achieved fairness is delta_time.
      A, B, C -- task
      delta_time -- time interval achieved fairness
      Note: A, B, C also represent the corresponding sched_entity of task A, B, C if not specified
Now we can get the following equation:

      delta_time = delta_exec_A + delta_exec_B + delta_exec_C
         where delta_X is the executing time of task A, B, or C
The wait_runtime of A, B, C is:
      delta_wait_A = delta_time - delta_exec_A
      delta_wait_B = delta_time - delta_exec_B
      delta_wait_C = delta_time - delta_exec_C
For efficiecy, we should calculate all needed values in task context other than tranversing the run queues. This means that we should not do as follows:
      delta_wait_A = delta_exec_B + delta_exec_C;
      delta_wait_C = delta_exec_A + delta_exec_B;
The fair_key of sched_entity is as above-mentioned:
      se.fair_key = cfs_rq.fair_clock - se.run_waittime
So the delta_wait_A is larger, se.fair_key is smaller. Note that cfs_rq.fair_clock is a clock to cfs scheduler. Since cfs_rq.fair_clock is clock, it increments with real elapsed time. delta_exec_A, delta_excu_B, delta_exec_C is real time. The cfs_rq.fair_clock is calculated as follows:
   cfs_rq.fair_clock += delta_exec_time * NICE_O_LOAD / cfs_rq.raw_weighted_load
      where cfs_rq.raw_weighted_load = se(A).load_weight + se(B).load_weight + se(C).load_weight;
The time interval can be calculated as follows: now let's consider the scheduling procedure.  A(1) -> B(1) -> C(1) -> A(2) -> ...
As above mentioned, the complete fairness can be achieved if in any time interval A, B, C should execute according to the load_weight(representing the priority). A(1) is executed at time T1 and successive executed at Time T2. The time interval is :
   T2 - T1
T2 can get from system jiffies. It is 'now' time and equals to current jiffies. T1 can be stored in the previous executed time. Then the time interval is:
   now - prev_stored_now
So when A is executed,
   cfs_rq.fair_clock += (Now - prev_stored_now) * NICE_0_LOAD / cfs_rq.raw_weighted_load
when B is executed,
   cfs_rq.fair_clock += (Now - prev_stored_now) * NICE_0_LOAD / cfs_rq.raw_weighted_load
and etc.
to be continued...

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u3/103132/showart_2037281.html

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)