免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1928 | 回复: 0
打印 上一主题 下一主题

CPU负载平衡之--运行队列的load计算[09年4月刊] [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2009-05-06 14:46 |只看该作者 |倒序浏览
CPU负载平衡之--运行队列的load计算
ChinaUnix网友wxc200
   在每个运行队列struct rq里,load代表当前运行队列的负载,同时还有一个cpu_load[]这样的数组,在我理解,它是一个分级别的代表当前运行队列负载的“替身”。在多cpu调度时,会计算不同的cpu domain的负载,根据不同的index, 会选取相应的cpu_load[]作为当前运行队列的负载返回。
   在每个tick处理函数scheduler_tick()中,会调用update_cpu_load(),来更新当前运行队列的负载,便于后面cpu平衡调度时选取最忙的cpu.比如调度函数find_busiest_group() ,它的工作是:find_busiest_group finds and returns the busiest CPU group within the domain. 它会通过两个函数source_load()target_load()来计算并返回当前运行队列的负载。例如 target_load()是这样的:

static unsigned long target_load(int cpu, int type)
{
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu); 返回当前run queueload

if (type == 0 || !sched_feat(LB_BIAS))
return total;

return max(rq->cpu_load[type-1], total); 返回相应 "替身" 与运行队列负载的较大者
}
这两个函数会有个参数type,它的目的就是选取相应的cpu_load[]

在系统初始化过程中,会把cpu_load默认为0. 有兴趣的朋友可以参考kernel/sched.c sched_init()函数,linux 2.6.28 line:8286,片断代码如下:

for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
rq->cpu_load[j] = 0;

那么后面这个数组是怎么变化的呢?它的变化趋势是什么样子的呢?
我觉得update_cpu_load()还是比较有意思,来一起分析下。

/*
* Update rq->cpu_load[] statistics. This function is usually called every
* scheduler tick (TICK_NSEC).
*/
static void update_cpu_load(struct rq *this_rq) //当前运行队列指针作为函数参数
{
unsigned long this_load = this_rq->load.weight; //当前运行队列的负载值
int i, scale;

this_rq->nr_load_updates++; //代表load的更新次数 每一个tick都会加1 ,真够忙的。

/* Update our load: */ //CPU_LOAD_IDX_MAX在运行队列结构体中=5,这儿对5个等级的cpu_load[]进行更新。
for (i = 0, scale = 1; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
unsigned long old_load, new_load;

/* scale is effectively 1 << i now, and >> i divides by scale */ //请注意,这个scale就是2 i 次幂。它
的值分别是 1 2 4 8 16

old_load = this_rq->cpu_load; //当前cpu_load[]数组里面的值
new_load = this_load; //当前运行队列的负载值
/*
* Round up the averaging division if load is increasing. This
* prevents us from getting stuck on 9 if the load is 10, for
* example.
*/ round up//目的 如果load是在增长的,不要把余数别浪费了
if (new_load > old_load)
new_load += scale-1;
this_rq->cpu_load = (old_load*(scale-1) + new_load) >> i; //这个公式,我会下面分析。
它的意思就是根据当前运行队列的负载以及上次cpu_load[]的数值,计算出当前cpu_load[]应该的变化。
}
}

现在分析下这个公式;
假设级别为i(0~4),运行队列的loadM,计算前的cpu_load = mi,则每次的计算新的cpu_load 遵循如下规律:
cpu_load = (M-mi)/2^i + mi
I = 0~4 的情况下,此公式可扩展为:
i=0: M - mi + mi
i=1: (M - mi)/2 + mi
i=2: (M-mi)/4 + mi
i=3: (M-mi)/8 + mi
i=4: (M-mi)/16 + mi

由此可见,在M遵循上面的变化趋势下,等级为0的变化最为剧烈。
   另外,如果运行队列的load值比当前cpu_load[]的值大,会对M的值有个补偿:举这样一个例子,假如M - mi = 17,对于计算i=4,来说,17/16 = 1,余数为1 ,这样太浪费了,我要把余数也计算进来,所以我要在M-mi处理时,加上(16-1),这样,就不会浪费余数了。一句话:Round Up
   具体在系统中这个函数是如何变化的呢? 可以在终端看下: cat /proc/sched_debu,可以看到不同cpu的相应数值.我的pc有两个cpu,它们相应的与此函数相关的数据如下;
cpu#0, 2705.784 MHz
.nr_running : 2
.load : 2048
.nr_load_updates : 6651134
.cpu_load[0] : 1024
.cpu_load[1] : 576
.cpu_load[2] : 364
.cpu_load[3] : 214
.cpu_load[4] : 124

...

cpu#1, 2705.784 MHz
.nr_running : 0
.load : 0
.nr_load_updates : 6846119
.cpu_load[0] : 0
.cpu_load[1] : 0
.cpu_load[2] : 15
.cpu_load[3] : 61
.cpu_load[4] : 110

   现在我在此函数里加一些调试信息,看它在系统boot 过程中的变化,借此总结出此函数的变化趋势。 我加了一些调试信息,新的函数如下:

static void update_cpu_load(struct rq *this_rq)
{
unsigned long this_load = this_rq->load.weight;
int i, scale;

this_rq->nr_load_updates++;
printk("Before calculate: %d\n", this_load); //在调用此函数时,打印当前运行队列的负载
/* Update our load: */
for (i = 0, scale = 1; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
unsigned long old_load, new_load;

/* scale is effectively 1 << i now, and >> i divides by scale */

old_load = this_rq->cpu_load;
new_load = this_load;
/*
* Round up the averaging division if load is increasing. This
* prevents us from getting stuck on 9 if the load is 10, for
* example.
*/
if (new_load > old_load)
new_load += scale-1;
this_rq->cpu_load = (old_load*(scale-1) + new_load) >> i;
printk(KERN_INFO "old_load = %d,this_rq->cpu_load[%d] = %d, new_load
= %d\n",old_load, i, this_rq->cpu_load, new_load);
//这儿我分别打印了更新前的 cpu_load,更新后的cpu_load,补偿后的运行队列负载
}

}

下面是调试的信息(限于篇幅,只分析几个调试片段):
debug information:

[ 0.007812] Before calculate: 0
[ 0.007812] old_load = 0,this_rq->cpu_load[0] = 0, new_load = 0
[ 0.007812] old_load = 0,this_rq->cpu_load[1] = 0, new_load = 0
[ 0.007812] old_load = 0,this_rq->cpu_load[2] = 0, new_load = 0
[ 0.007812] old_load = 0,this_rq->cpu_load[3] = 0, new_load = 0
[ 0.007812] old_load = 0,this_rq->cpu_load[4] = 0, new_load = 0
...
[ 0.015625] Before calculate: 7266
[ 0.015625] old_load = 0,this_rq->cpu_load[0] = 7266, new_load =
7266
[ 0.015625] old_load = 0,this_rq->cpu_load[1] = 3633, new_load =
7267
[ 0.015625] old_load = 0,this_rq->cpu_load[2] = 1817, new_load =
7269
[ 0.015625] old_load = 0,this_rq->cpu_load[3] = 909, new_load =
7273
[ 0.015625] old_load = 0,this_rq->cpu_load[4] = 455, new_load =
7281
...
[ 0.015625] <MAX9724> GPIOs setup done
[ 0.023437] Before calculate: 177522
[ 0.023437] old_load = 7266,this_rq->cpu_load[0] = 177522, new_load
= 177522
[ 0.023437] old_load = 3633,this_rq->cpu_load[1] = 90578, new_load
= 177523
[ 0.023437] old_load = 1817,this_rq->cpu_load[2] = 45744, new_load
= 177525
[ 0.023437] old_load = 909,this_rq->cpu_load[3] = 22986, new_load =
177529
[ 0.023437] old_load = 455,this_rq->cpu_load[4] = 11522, new_load =
177537
...
[ 0.046875] Before calculate: 355044
[ 0.046875] old_load = 0,this_rq->cpu_load[0] = 355044, new_load =
355044
[ 0.046875] old_load = 22644,this_rq->cpu_load[1] = 188844,
new_load = 355045
[ 0.046875] old_load = 25731,this_rq->cpu_load[2] = 108060,
new_load = 355047
[ 0.046875] old_load = 17598,this_rq->cpu_load[3] = 59779, new_load
= 355051
[ 0.046875] old_load = 10125,this_rq->cpu_load[4] = 31683, new_load
= 355059
...
[ 0.078125] Before calculate: 0
[ 0.078125] old_load = 0,this_rq->cpu_load[0] = 0, new_load = 0
[ 0.078125] old_load = 23605,this_rq->cpu_load[1] = 11802, new_load
= 0
[ 0.078125] old_load = 45587,this_rq->cpu_load[2] = 34190, new_load
= 0
[ 0.078125] old_load = 40046,this_rq->cpu_load[3] = 35040, new_load
= 0
[ 0.078125] old_load = 26104,this_rq->cpu_load[4] = 24472, new_load
= 0
[ 0.078125] I think it only execute several times................
...
[ 0.101562] Before calculate: 0
[ 0.101562] old_load = 0,this_rq->cpu_load[0] = 0, new_load = 0
[ 0.101562] old_load = 2950,this_rq->cpu_load[1] = 1475, new_load =
0
[ 0.101562] old_load = 19231,this_rq->cpu_load[2] = 14423, new_load
= 0
[ 0.101562] old_load = 26827,this_rq->cpu_load[3] = 23473, new_load
= 0
[ 0.101562] old_load = 21508,this_rq->cpu_load[4] = 20163, new_load
= 0

[ 0.132812] Before calculate: 0
[ 0.132812] old_load = 0,this_rq->cpu_load[0] = 0, new_load = 0
[ 0.132812] old_load = 184,this_rq->cpu_load[1] = 92, new_load = 0
[ 0.132812] old_load = 6084,this_rq->cpu_load[2] = 4563, new_load =
0
[ 0.132812] old_load = 15723,this_rq->cpu_load[3] = 13757, new_load
= 0
[ 0.132812] old_load = 16612,this_rq->cpu_load[4] = 15573, new_load
= 0
[ 0.140625] Before calculate: 0
[ 0.140625] old_load = 0,this_rq->cpu_load[0] = 0, new_load = 0
[ 0.140625] old_load = 92,this_rq->cpu_load[1] = 46, new_load = 0
[ 0.140625] old_load = 4563,this_rq->cpu_load[2] = 3422, new_load =
0
[ 0.140625] old_load = 13757,this_rq->cpu_load[3] = 12037, new_load
= 0
[ 0.140625] old_load = 15573,this_rq->cpu_load[4] = 14599, new_load
= 0
[ 0.156250] TCP reno registered
........
[ 79.617187] Before calculate: 360213
[ 79.617187] old_load = 360213,this_rq->cpu_load[0] = 360213, new_load = 360213
[ 79.617187] old_load = 360213,this_rq->cpu_load[1] = 360213, new_load = 360213
[ 79.617187] old_load = 360213,this_rq->cpu_load[2] = 360213, new_load = 360213
[ 79.617187] old_load = 360213,this_rq->cpu_load[3] = 360213, new_load = 360213
[ 79.617187] old_load = 360213,this_rq->cpu_load[4] = 360213, new_load = 360213
[ 79.656250] Before calculate: 360213
.......

为了大家看得方便,我把无关的启动信息用...代替了。
大家只需要看我添加的打印信息即可。

可以明显看到,运行队列的负载在变化。
this_rq->load.weight的变化规律:
初始化0 -->突然变得很大 -->0---->最终的平衡值(如果没有新的task 加入)

大家可以分析四种特殊情况:
  • 刚启动,运行队列负载为0 -->都是0
  • 一些进程加入,运行队列负载突增 -->按照公式,分导递增
  • 运行队列负载又变成0 -->按照公式,分导递减
  • 运行队列负载逐渐趋于平衡 → 所有层的负载与运行队列负载相同
   通过后面的调试发现,cpu_load[]数组的值,总是在尽量地接近运行队列的load值,,随着scale增大,接进得越慢。。。cpu_load[0]总是与运行队列的load值相同。
   后面的文章里,我将会总体上分析不同cpu之间调度平衡的原理及代码执行实例.
参考资料:
[PATCH -cfs] Fix cpu_load calculation error
(http://lists.zerezo.com/linux-kernel/msg12735442.html)

CFS-devel, group scheduler, fixes
(http://lists.zerezo.com/linux-kernel/msg13412736.html)

Re: [PATCH] sched: Improve readability in update_cpu_load() code
(http://lkml.indiana.edu/hypermail/linux/kernel/0805.1/3068.html)
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP