平台论坛博客文库

› 论坛 › 操作系统 › BSD › freebsd9.2-ULE线程调度-保持cpu运行队列间的平衡-sched ...

[FreeBSD] freebsd9.2-ULE线程调度-保持cpu运行队列间的平衡-sched_balance函数 [复制链接]

71v5

家境小康

论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2014-06-29 23:02 |只看该作者 |倒序浏览

本帖最后由 71v5 于 2014-06-30 01:33 编辑

从帖子"freebsd9.2-何时调用mini_switch函数-线程时间片用完"的描述中可知，当statclock超时后，statclock_cnt函数
会调用调度程序相关的sched_clock函数进行相应的处理，sched_clock函数的一个任务就是检查是否需要调用函数
sched_balance在cpu的运行队列间迁移thread，以保持cpu运行队列间的平衡,这里再摘取相关的代码，这样就会更清晰:

2174 /*
2175 * Handle a stathz tick. This is really only relevant for timeshare
2176 * threads.
2177 */
2178 void
2179 sched_clock(struct thread *td)
2180 {
2181 struct tdq *tdq;
2182 struct td_sched *ts;
2183 /* 2185：tdq指向当前cpu对应的struct tdq对象 */
2184 THREAD_LOCK_ASSERT(td, MA_OWNED);
2185 tdq = TDQ_SELF();
2186 #ifdef SMP
2187 /***********************************************************************
2188 * We run the long term load balancer infrequently on the first cpu.
static struct tdq *balance_tdq;
static int balance_ticks;
2190-2193：
在ULE调度程序初始化阶段，BSP会将balance_tdq设置为BSP对应的struct tdq
数据对象的地址。
balance_ticks为调用sched_balance函数在系统中cpu的运行队列间迁移线程
的周期，在ULE调度程序初始化阶段设置，具体值是多少应该没什么意义：
balance_interval默认值为128.
balance_ticks = max(balance_interval / 2, 1);
balance_ticks += random() % balance_interval;
当balance_ticks为零0时，函数sched_balance就要保持系统中cpu运行队列
间的平衡。
只在BSP上执行。
2189 *************/
2190 if (balance_tdq == tdq) {
2191 if (balance_ticks && --balance_ticks == 0)
2192 sched_balance();
2193 }
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
}

复制代码

这里为了简化分析，做下面的假设：
1：系统中有两个物理cpu，每个cpu有4个物理核心，logical cpu id分别为0，1，2，3，4，5，6，7，后面就
以cpu0，cpu1，cpu2等等标识这些logical cpu，其对应的运行队列分别记为tdq_cpu[0]，tdq_cpu[1]，tdq_cpu[2]
，tdq_cpu[3]等等。

2：cpuset_t类型的大小为1字节，并且bit位以从左到右的顺序编号，那么：
对于一个cpuset_t类型的cpumask变量，如果其编码为11111111，如果某个bit位设置为1，那么该bit位对应的cpu
就被检查，比如如果bit0被设置为1，就要检查cpu0。

这里先简要分析一下sched_lowest函数，大家可以参照sched_highest函数的分析看看sched_lowest函数到底怎么工作的。
[函数sched_lowest]：

/******************************************************************************************
* Find the cpu with the least load via the least loaded path that has a
* lowpri greater than pri pri. A pri of -1 indicates any priority is
* acceptable.
*
参数描述：
cg：所要检查的cpu_group。
mask：将要检查的cpu集合。
pri: 该参数可以忽略。
maxload：该参数比较重要，一般设置为上一次调用函数sched_highest返回的cpu运行
队列最高的cpu的的负载，在确定cpu运行队列负载最小的cpu时，只有当
相应cpu运行队列的负载小于或者等于maxload才会被检查。
prefer：上一次调用函数sched_highest返回的cpu的logical cpu id，当所检查的
cpu和prefer相同时，将prefer运行队列的负载减去一个常数。
函数返回值：
-1：表示期间创建了好多thread，这些thread被添加到了cpu的运行队列中，导致所检查的
cpu运行队列的负载都比maxload高。
非零：运行队列负载最低的cpu的logical cpu id。
其实只要这些参数的含义弄明白，下来就是就是一些简单的分析过程了。
************************************/
744 static inline int
745 sched_lowest(const struct cpu_group *cg, cpuset_t mask, int pri, int maxload,
746 int prefer)
747 {
748 struct cpu_search low;
749
750 low.cs_cpu = -1;
751 low.cs_prefer = prefer;
752 low.cs_mask = mask;
753 low.cs_pri = pri;
754 low.cs_limit = maxload;
755 cpu_search_lowest(cg, &low);
756 return low.cs_cpu;
757 }
716 /*
717 * cpu_search instantiations must pass constants to maintain the inline
718 * optimization.
719 */
720 int
721 cpu_search_lowest(const struct cpu_group *cg, struct cpu_search *low)
722 {
723 return cpu_search(cg, low, NULL, CPU_SEARCH_LOWEST);
724 }

复制代码

[ULE调度程序-sched_balance函数]：

842 static void
843 sched_balance(void)
844 {
845 struct tdq *tdq;
846
847
848
849
850
/****************************************************************************************
* Select a random time between .5 * balance_interval and
* 1.5 * balance_interval.
*
* 变量reblance：表示是否要进行cpu运行队列间的再平衡，初始值为1，可以通过
* sysctl系统调用来更改。
* static int rebalance = 1;
851-852：重新计算balance_ticks。
853：如果不是SMP系统或者变量rebalance为0，就直接返回，因为没有必要进行
cpu运行队列的再平衡操作。
855：因为只有BSP执行sched_balance函数，而BSP的logical cpu id为0，所以
这里tdq的值就为&tdq_cpu[0]。
857：以参数cpu_top调用函数sched_balance_group
参数cpu_top请参考帖子"freebsd9.2-ULE线程调度-创建数据结构来描述CPU拓扑信息"
**************************/
851 balance_ticks = max(balance_interval / 2, 1);
852 balance_ticks += random() % balance_interval;
853 if (smp_started == 0 || rebalance == 0)
854 return;
855 tdq = TDQ_SELF();
856 TDQ_UNLOCK(tdq);
857 sched_balance_group(cpu_top);
858 TDQ_LOCK(tdq);
859 }

复制代码

[函数sched_balance_group]：

798 static void
799 sched_balance_group(struct cpu_group *cg)
800 {
/********************************************************************************
* 局部变量描述：
hamsk：在调用函数sched_highest检查运行队列负载最高的cpu时使用，该变量是一个
cpu位图的集合，设置为1的bit位对应的cpu都会被检查。
lmask：类似于hamsk，在检查运行队列负载最低的cpu时使用。
high：运行队列负载最高的cpu的logical cpu id。
low：运行队列负载最低的cpu的logical cpu id。
**************************************/
801 cpuset_t hmask, lmask;
802 int high, low, anylow;
803
/***************************************************************************************************
* 804：for循环前，将变量hmask的bit位全设置为1，表示检查系统中全部的cpu。
805-839：一个无限for循环，当下面任何一个条件满足时，中止for循环：
条件1：函数sched_highest返回值为-1，在一般情况下，如果返回值为-1，就表示所
检查cpu运行队列中可迁移thread的数目为0。
条件2：变量lmask中的bit位全部为0，表示没有要检查的cpu。
条件3：anylow变量为1，并且函数sched_lowest返回值为-1。
810：如果执行到这里，就表示函数sched_highest执行成功，这里假设函数sched_highest
返回值为6(high值为6)，即cpu6的运行队列tdq_cpu[6]的负载最高，宏CPU_CLR将变量hmask中的bit6
清零，这就意味着下一次调用sched_highest函数时将不会检查cpu6。
811：将更新后的hmask变量copy到lmask变量中，函数sched_lowest将检查lmask变量中包含
的cpu运行队列的负载，选择一个负载最低的cpu。
813-815：当变量lmask为空时，即变量lmask中的bit位全为零，表示没有要检查的cpu，此时
调用函数sched_lowest没有意义，跳出for循环。
817-818：调用函数sched_lowest确定cpu运行队列负载最低的cpu，这里假设函数sched_lowest的返回值
为2(low值为2)，即cpu2的运行队列tdq_cpu[2]的负载最低。
820-821：对应上面的条件3。
823-824：当函数sched_lowest返回-1时，此时继续for循环，如果期间一些thread意外终止或者
主动执行了exit操作，那么下一次检查时，sched_lowest函数有可能返回有意义的值。
825-828：执行到这里话，high为负载最高的cpu，low为负载最低的cpu，此时函数
sched_balance_pair执行一个将thread从运行队列tdq_cpu[6]迁移到运行队列tdq_cpu[2]
中的操作，以下面的形式调用函数：
sched_balance_pair(&tdq_cpu[6], &tdq_cpu[2])；
826-828：如果函数sched_balance_pair返回1，将变量hmask中的bit2清零，这意味着下次for循环时将
不会检查cpu6和cpu2。
830-837：如果函数sched_balance_pair返回0，此时将变量lmask中的bit2清零，同时将变量anylow
设置为0，跳转到nextlow处继续执行，继续确定下一个运行队列负载最低的cpu。
***********************/
804 CPU_FILL(&hmask);
805 for (;;) {
806 high = sched_highest(cg, hmask, 1);
807 /* Stop if there is no more CPU with transferrable threads. */
808 if (high == -1)
809 break;
810 CPU_CLR(high, &hmask);
811 CPU_COPY(&hmask, &lmask);
812 /* Stop if there is no more CPU left for low. */
813 if (CPU_EMPTY(&lmask))
814 break;
815 anylow = 1;
816 nextlow:
817 low = sched_lowest(cg, lmask, -1,
818 TDQ_CPU(high)->tdq_load - 1, high);
819 /* Stop if we looked well and found no less loaded CPU. */
820 if (anylow && low == -1)
821 break;
822 /* Go to next high if we found no less loaded CPU. */
823 if (low == -1)
824 continue;
825 /* Transfer thread from high to low. */
826 if (sched_balance_pair(TDQ_CPU(high), TDQ_CPU(low))) {
827 /* CPU that got thread can no longer be a donor. */
828 CPU_CLR(low, &hmask);
829 } else {
830 /*
831 * If failed, then there is no threads on high
832 * that can run on this low. Drop low from low
833 * mask and look for different one.
834 */
835 CPU_CLR(low, &lmask);
836 anylow = 0;
837 goto nextlow;
838 }
839 }
840 }

复制代码

[函数sched_balance_pair]-负责在两个运行队列间迁移thread：

/********************************************************************
* Transfer load between two imbalanced thread queues.
*
参数描述:
high：负载最高的运行队列，这里为tdq_cpu[6]。
low：负载最低的运行队列，这里为tdq_cpu[2]。
sched_balance_pair函数返回值的含义请参考一下下面对tdq_move函数
返回值的分析。
****************************/
889 static int
890 sched_balance_pair(struct tdq *high, struct tdq *low)
891 {
/*********************************************************************************
* 局部变量描述：
moved：可以迁移的thread的数目。
cpu：运行队列low对应的cpu的logical cpu id。
901-912：再次检查迁移条件是否满足，tdq_move函数见下面的分析。
一般情况下if语句的前两个条件都满足，此时：
在函数tdq_move返回1时，检查是否需要向运行队列low对应的cpu发出一个处理器间
中断(IPI),后续有时间的话，将和大家分享一下处理器间中断的发送过程以及相关
的处理。
*******************************************/
892 int moved;
893 int cpu;
894
895 tdq_lock_pair(high, low);
896 moved = 0;
897 /*
898 * Determine what the imbalance is and then adjust that to how many
899 * threads we actually have to give up (transferable).
900 */
901 if (high->tdq_transferable != 0 && high->tdq_load > low->tdq_load &&
902 (moved = tdq_move(high, low)) > 0) {
903 /*
904 * In case the target isn't the current cpu IPI it to force a
905 * reschedule with the new workload.
906 */
907 cpu = TDQ_ID(low);
908 sched_pin();
909 if (cpu != PCPU_GET(cpuid))
910 ipi_cpu(cpu, IPI_PREEMPT);
911 sched_unpin();
912 }
913 tdq_unlock_pair(high, low);
914 return (moved);
915 }