Chinaunix

标题: Anatomizing Linux scheduler: CFS 1 [打印本页]

作者: superfluous 时间: 2009-08-24 15:12
标题: Anatomizing Linux scheduler: CFS 1
The CFS algorithm uses a fair clock to represent the clock of CFS. All tasks in CFS runqueue  use this clock to achieve fairness. The fairness is calculated as follows:
   cfs_rq.fair_clock - se.wait_runtime
The value is called fair_key and stored in se.fair_key. The RB tree uses this value to sort all sched_entity. The least value is stored in the leftmost node of RB tree;and largest value in the rightmost node of RB tree. When any scheduling occurs, the scheduler selects the leftmost node of RB tree. To fastly obtain the sechdule entity, Linux uses a pointer called "rb_leftmost" to point the leftmost node. The principle of CFS is to achieve, as its name indicated, complete fairness. This means that in a time interval all tasks should runs according to their priorities. In time-sharing system the priorities of all tasks should be the same, but it can use system call nice() to change the task's priorities. So when a task calls nice(), it should boost or deduct the execution time of CPU.
For example, let us assume two tasks running in the system, Task A and Task B. Task A and Task B should occupy 50% cpu capabilities in any time interval when using CFS. In ideal condition, if the time interval is 1 minutes, Task A runs 30 seconds and  Task B 30 seconds. If the time interval is 1 seconds, Task A runs 30 seconds and task B 30 seconds. if the time interval is 1 miliiseconds, Task A runs 500 microseconds and task B 500 microseconds etc. When using nice(), Linux defines load weight to calculate the executing time. Now when a task is boosted one, it obtains 10% cpu more.
const int prio_to_weight[40] = {
   /* -20 */88818, ...
   /* -10 */9537, ..., 1280,
   /*  0  */NICE_0_LOAD, /* 1024 */
   /*  1  */819, ...
   /*  10 */110,...
};
When a task is forked, its priority is assigned to 120, i.e. nice(0), which load_weight is 1024(NICE_0_LOAD). When it boosts to 119, i.e. nice(119 - 120) = nice(-1), its load_weight is 1280. Assume that there are two tasks A and B. Initially Prio(A) = 1024 and Prio(B) = 1024. The two tasks occupy 50% CPU. Now task A is boosted to 119, i.e. Prio(A) = 1280. Then its cpu occupiance is :
   prio(A) / (prio(A) + prio(B)) = 1280 / (1280 + 1024) = 55%(almost)
B's cpu occupiance is:
   prio(B) / (prio(A) + prio(B))  = 1024 / (1280 + 1024) = 45%(almost)
In reality, this results can hardly achieved. The reasons are:
  1. the time granularity of tick
   When does the scheduling occurs? There are the following possible conditions:(not consider real-time condition)
   a. When a new task is forked
   b. when a sleeping task is waked-up
   c. when a tick occurs
   d. when a task yields cpu
   The precise granularity is limited by granularity of tick. This means that when system tick is 5ms, the time interval should be more than 5 milliseconds
  2. the bias of scheduling
   The other part other than scheduler will run some cpu's time.
(to be continued...)

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u3/103132/showart_2035623.html

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)