平台论坛博客文库

› 论坛 › 操作系统 › BSD › freebsd9.2-ULE线程调度-将线程添加到cpu运行队列中-sch ...

[FreeBSD] freebsd9.2-ULE线程调度-将线程添加到cpu运行队列中-sched_add函数 [复制链接]

71v5

家境小康

论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2014-07-01 01:37 |只看该作者 |倒序浏览

这里分析一下最普遍的情况，就是fork系统调用创建的thread，新创建的thread最终都要添加到系统中某个cpu的运行队列中，这样才能
得到调度运行，在do_fork函数的最后阶段，会调用调度程序相关的sched_add函数将新创建的thread添加到系统中某个cpu的运行队列中。

注：
发现之前的帖子中有些用语不是很严谨，比如tdq_cpu[cpu]应该是per processor runqs and statistics，而在此之前称之为运行队列；
实际上struct tdq对象中的tdq_realtime，tdq_timeshare，tdq_idle成员才被称为运行队列。
现在为了和前面保持一致，将tdq_realtime，tdq_timeshare，tdq_idle成员标识的运行队列称为tdq_realtime队列，tdq_timeshare队列，
tdq_idle队列，不过这样并不影响理解ULE调度程序的实现原理。

[ULE线程调度-sched_add函数]：

/**************************************************************************************
* Select the target thread queue and add a thread to it. Request
* preemption or IPI a remote processor if required.
参数描述：
td：标识将要被添加到cpu运行队列中的线程。
flags：传递给函数sched_add的标志，相关标志需要结合上下文分析，这里暂且略过。
*******************************/
2342
2343 void
2344 sched_add(struct thread *td, int flags)
2345 {
2346 struct tdq *tdq;
/***************************************************************************
* 这里将局部变量更改为smp_cpu，以和下面的分析语句中的cpu区别
***********************************/
2347 #ifdef SMP
2348 int smp_cpu;
2349 #endif
2350
2351 KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
2352 "prio:%d", td->td_priority, KTR_ATTR_LINKED,
2353 sched_tdname(curthread));
2354 KTR_POINT1(KTR_SCHED, "thread", sched_tdname(curthread), "wokeup",
2355 KTR_ATTR_LINKED, sched_tdname(td));
2356 SDT_PROBE4(sched, , , enqueue, td, td->td_proc, NULL,
2357 flags & SRQ_PREEMPTED);
2358 THREAD_LOCK_ASSERT(td, MA_OWNED);
/*******************************************************************************
* Recalculate the priority before we select the target cpu or
* run-queue.
2363-2364：
struct thread对象的td_pri_class成员，即newtd的调度类：
u_char td_pri_class; (t) Scheduling class.
#define PRI_TIMESHARE 3 Time sharing process.
如果new_thread的调度类为PRI_TIMESHARE，此时就要调用函数sched_priority
重新计算newtd的优先级。
一般情况下，系统中所有thread的调度类都从thread0(0号线程)继承而来，在系统
初始化时，将thread0的调度类设置为PRI_TIMESHARE。除非某些应用程序执行一个
rtprio系统调用显式修改线程的调度类，否则线程的调度类通常都为PRI_TIMESHARE。
但是这个修改也被限制为下面三种调度类：
#define PRI_REALTIME 2 /* Real time process. */
#define PRI_TIMESHARE 3 /* Time sharing process. */
#define PRI_IDLE 4 /* Idle process. */
***********************************/
2363 if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE)
2364 sched_priority(td);
/***************************************************************************************
* 2365-2377：SMP系统
*
* Pick the destination cpu and if it isn't ours transfer to the
* target cpu.
2370：函数sched_pickcpu为线程td挑选一个合适的cpu，局部变量smp_cpu保存了该
cpu的logical cpu id。
2371：函数sched_setcpu主要完成下面的工作：
将和线程td相关联的struct td_sched数据对象的ts_cpu成员设置为smp_cpu。
并返回logical cpu id为smp_cpu的运行队列，这里假设为&tdq_cpu[smp_cpu]。
2372：函数tdq_add将线程td添加到内嵌在tdq指向的struct tdq数据对象
中tdq_realtime队列，tdq_timeshare队列，tdq_idle队列中的一个，
见下面的简要分析。
***********************************/
2365 #ifdef SMP
2366
2367
2368
2369
2370 smp_cpu = sched_pickcpu(td, flags);
2371 tdq = sched_setcpu(td, smp_cpu, flags);
2372 tdq_add(tdq, td, flags);
/*******************************************************************
* 2373-2376：
如果smp_cpu和当前cpu的logical cpu id不相等，那么就调用函数
tdq_notify检查线程td是否能够抢占当前正在cpu logical id为smp_cpu
上运行的线程，并根据需要发送一个处理器间中断。
函数tdq_notify见下面的简要分析。
*******************************************/
2373 if (smp_cpu != PCPU_GET(cpuid)) {
2374 tdq_notify(tdq, td);
2375 return;
2376 }
2377 #else
/*****************************************************************
* 2378-2385:单处理器系统，SMP系统的特殊情况，相关函数已经分析。
******************/
2378 tdq = TDQ_SELF();
2379 TDQ_LOCK(tdq);
2380 /*
2381 * Now that the thread is moving to the run-queue, set the lock
2382 * to the scheduler's lock.
2383 */
2384 thread_lock_set(td, TDQ_LOCKPTR(tdq));
2385 tdq_add(tdq, td, flags);
2386 #endif
/**************************************************************************************
* 执行到这里的话，或者系统不是SMP，或者smp_cpu和当前cpu的logical id相等。
#define SRQ_YIELDING 0x0001 We are yielding (from mi_switch).
如果没有设置SRQ_YIELDING标志，那么td可能就要抢占当前正在运行的thread，此时不需要
发送处理器间中断，因为在同一个cpu上，
此时：
函数sched_setpreempt在条件满足的情况下，简单的设置正在当前cpu上运行thread的
TDF_NEEDRESCHED标志，这样当从中断或者系统调用返回时，就会调用mini_switch函数。
*****************************************/
2387 if (!(flags & SRQ_YIELDING))
2388 sched_setpreempt(td);
2389 }

复制代码

[函数tdq_add]：

/*******************************************************************************************
* Add a thread to a thread queue. Select the appropriate runq and add the
* thread to it. This is the internal function called when the tdq is
* predetermined.
* 参数描述：
tdq：指向相应的cpu运行队列。
td：将要被添加到运行队列中的thread。
flags：一些标志。
函数tdq_add实际上调用函数tdq_runq_add将线程td添加到tdq_realtime队列，tdq_timeshare队列，
tdq_idle队列三个中的一个。
******************************/
2321 void
2322 tdq_add(struct tdq *tdq, struct thread *td, int flags)
2323 {
2324
2325 TDQ_LOCK_ASSERT(tdq, MA_OWNED);
2326 KASSERT((td->td_inhibitors == 0),
2327 ("sched_add: trying to run inhibited thread"));
2328 KASSERT((TD_CAN_RUN(td) || TD_IS_RUNNING(td)),
2329 ("sched_add: bad thread state"));
2330 KASSERT(td->td_flags & TDF_INMEM,
2331 ("sched_add: thread swapped out"));
2332
/***************************************************************************************
* 2333-2334:
更新struct tdq对象中的tdq_lowpri成员。从这里可以看出来，tdq_lowpri成员保存的是相应
CPU运行队列中线程的最高优先级(数值小)。
2335：调用函数tdq_runq_add，见下面的简要分析。
2336：函数tdq_load_add简单的递增cpu运行队列tdq的tdq_load成员。
即struct tdq对象的tdq_load成员：
tdq_load：对应cpu的负载，sched_highest函数和函数sched_lowest使用该成员。
**********************************************/
2333 if (td->td_priority < tdq->tdq_lowpri)
2334 tdq->tdq_lowpri = td->td_priority;
2335 tdq_runq_add(tdq, td, flags);
2336 tdq_load_add(tdq, td);
2337 }

复制代码

[函数tdq_runq_add]：

/*******************************************************************************
* Add a thread to the actual run-queue. Keeps transferable counts up to
* date with what is actually on the run-queue. Selects the correct
* queue position for timeshare threads.
* 参数的含义和上面tdq_add函数的参数相同。
******************************/
446 static __inline void
447 tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
448 {
449 struct td_sched *ts;
450 u_char pri;
451
452 TDQ_LOCK_ASSERT(tdq, MA_OWNED);
453 THREAD_LOCK_ASSERT(td, MA_OWNED);
454
/*******************************************************************************
* 455：pri为线程td的优先级。
456：td指向和线程td相关联的struct td_sched数据对象。
457:
#define TD_SET_RUNQ(td) (td)->td_state = TDS_RUNQ
此时线程td将要被添加到cpu运行队列中，其状态设置为TDS_RUNQ。
458-461：
#define THREAD_CAN_MIGRATE(td) ((td)->td_pinned == 0)
struct thread对象的td_pinned成员，函数sched_pin会递增该成员，
如果该成员非零，就表示线程addtd不能被迁移到其它cpu的运行队列中：
int td_pinned；Temporary cpu pin count.
struct tdq对象的tdq_transferable成员：
tdq_transferable: 一个计数器，cpu的相应运行队列中可迁移线程的数目。
在td_pinned成员为0时，此时就表示td是可迁移线程，此时递增tdq_transferable，
同时设置一个标志，标识线程td是可以迁移的。
#define TSF_XFERABLE 0x0002 Thread was added as transferable
****************************************************/
455 pri = td->td_priority;
456 ts = td->td_sched;
457 TD_SET_RUNQ(td);
458 if (THREAD_CAN_MIGRATE(td)) {
459 tdq->tdq_transferable++;
460 ts->ts_flags |= TSF_XFERABLE;
461 }
/*******************************************************************************
* #define PRI_MIN_BATCH 152
#define PRI_MAX_BATCH 223
struct td_sched对象的ts_runq成员，即相关联的thread将被添加到ts_runq
指向的队列中：
struct runq *ts_runq; Run-queue we're queued on.
462-464：
如果线程td的优先级pri小于152，就将线程td添加到tdq_realtime队列中。
464-486：
如果线程td的优先级小于或者等于223，就将线程td添加到tdq_timeshare队列中。
这之间的代码还要根据条件计算一个pri等，这个不是很清楚，大家可以帮忙看看。
487-488：
如果线程td的优先级pri大于223，就将线程td添加到tdq_idle队列中。
函数runq_add_pri和函数runq_add不属于ULE线程调度实现源代码定义的，这两个
函数是简单的链表操作，大家感兴趣的话，可以参考一下：
"$FreeBSD: release/9.2.0/sys/kern/kern_switch.c"这个c源文件。
****************************/
462 if (pri < PRI_MIN_BATCH) {
463 ts->ts_runq = &tdq->tdq_realtime;
464 } else if (pri <= PRI_MAX_BATCH) {
465 ts->ts_runq = &tdq->tdq_timeshare;
466 KASSERT(pri <= PRI_MAX_BATCH && pri >= PRI_MIN_BATCH,
467 ("Invalid priority %d on timeshare runq", pri));
468 /*
469 * This queue contains only priorities between MIN and MAX
470 * realtime. Use the whole queue to represent these values.
471 */
472 if ((flags & (SRQ_BORROWING|SRQ_PREEMPTED)) == 0) {
473 pri = RQ_NQS * (pri - PRI_MIN_BATCH) / PRI_BATCH_RANGE;
474 pri = (pri + tdq->tdq_idx) % RQ_NQS;
475 /*
476 * This effectively shortens the queue by one so we
477 * can have a one slot difference between idx and
478 * ridx while we wait for threads to drain.
479 */
480 if (tdq->tdq_ridx != tdq->tdq_idx &&
481 pri == tdq->tdq_ridx)
482 pri = (unsigned char)(pri - 1) % RQ_NQS;
483 } else
484 pri = tdq->tdq_ridx;
485 runq_add_pri(ts->ts_runq, td, pri, flags);
486 return;
487 } else
488 ts->ts_runq = &tdq->tdq_idle;
489 runq_add(ts->ts_runq, td, flags);
490 }

复制代码

[函数tdq_notify]：

/*******************************************************************************
* Notify a remote cpu of new work. Sends an IPI if criteria are met.
和上下文联系起来分析函数tdq_notify会更清晰。
对于logical cpu id为x的cpu，此时参数描述及相应成员的取值如下：
参数描述：
tdq：为&tdq_cpu[x]。
td：已经被添加到运行队列tdq_cpu[x]的tdq_realtime队列，tdq_timeshare队列，
tdq_idle队列三个队列之一中的线程。
成员取值描述：
td->td_sched->ts_cpu的值为x。
*****************************************/
1010 */
1011 static void
1012 tdq_notify(struct tdq *tdq, struct thread *td)
1013 {
1014 struct thread *ctd;
1015 int pri;
1016 int cpu;
1017
/************************************************************
* 1018-1019：
如果struct tdq对象的tdq_ipipending成员非零，就表示已经
有一个挂起的处理器间中断，此时直接返回。
*****************************/
1018 if (tdq->tdq_ipipending)
1019 return;
/**********************************************************************
* 1020: 根据假设，这里cpu的值为x。
1021：pri设置为线程td的优先级。
1022：ctd标识当前正在cpu x上运行的线程。
1023-1024：
sched_shouldpreempt函数根据线程td的优先级和线程ctd的优先级
检查线程td是否能够抢占线程ctd，返回值1表示可以抢占；
返回值0表示不能抢占，见下面的简要分析。
*************************************/
1020 cpu = td->td_sched->ts_cpu;
1021 pri = td->td_priority;
1022 ctd = pcpu_find(cpu)->pc_curthread;
1023 if (!sched_shouldpreempt(pri, ctd->td_priority, 1))
1024 return;
/***********************************************************************
* #define TD_IS_IDLETHREAD(td) ((td)->td_flags & TDF_IDLETD)
#define TDF_IDLETD 0x00000020 This is a per-CPU idle thread.
1025-1032: 如果ctd为per-CPU idle thread就进行相关的处理。
这里暂时忽略。
**********************************/
1025 if (TD_IS_IDLETHREAD(ctd)) {
1026 /*
1027 * If the MD code has an idle wakeup routine try that before
1028 * falling back to IPI.
1029 */
1030 if (!tdq->tdq_cpu_idle || cpu_idle_wakeup(cpu))
1031 return;
1032 }
/***************************************************************
* 如果执行到这里，就表示线程td将抢占线程ctd。
* 1033：将struct tdq对象的tdq_ipipending成员设置为1。
1034：发送类型为IPI_PREEMPT的处理器间中断。
又碰到处理器间中断了，在SMP系统中，处理器间中断非常
重要，打算下次跟大家分享一下在intel平台下，处理器
间中断的发送及处理过程。
********************************/
1033 tdq->tdq_ipipending = 1;
1034 ipi_cpu(cpu, IPI_PREEMPT);
1035 }

复制代码

[函数sched_shouldpreempt]：

/***************************************************************************
* sched_shouldpreempt根据优先级判断是否能够抢占，返回值1表示可以抢占；返回
值0表示不能抢占。
参数描述：
pri：将要抢占的thread的优先级，这里假设为td1。
cpri：将要被抢占的thread的优先级，这里假设为td2。
remote：标识是否是同一个cpu。
*************************************/
408 static inline int
409 sched_shouldpreempt(int pri, int cpri, int remote)
410 {
/***************************************************************************
* If the new priority is not better than the current priority there is
* nothing to do.
415-416：
如果415行的if条件为真，就表示td1的优先级比td2的优先级低，此时不需要
抢占，返回值为0.数值越大，优先级越低。
***********************************/
414
415 if (pri >= cpri)
416 return (0);
/************************************************************************
* Always preempt idle.
420-421：
#define PRI_MIN_IDLE (224)
如果420行的if条件为真，就表示td2为idle线程，这种情况下总是抢占td2.
返回值为1。
*****************************/
419
420 if (cpri >= PRI_MIN_IDLE)
421 return (1);
422
423
/***************************************************************
* If preemption is disabled don't preempt others.
425-431：
#define PRI_MAX_IDLE (PRI_MAX) 255
#define PRI_MIN_KERN (80)
开启或禁用kernel thread preemption。
[1]:对应430-431行。
只有当编译了内核选项PREEMPTION时，preempt_thresh的值
才非零，此时kernel thread preemption才有效，分两种情况：
编译了FULL_PREEMPTION选项时：
preempt_thresh被设置为PRI_MAX_IDLE，表示始终进行抢占。
没有编译FULL_PREEMPTION选项时：
preempt_thresh被设置为PRI_MIN_KERN，只有当线程优先级
小于或者等于PRI_MIN_KERN时，才进行抢占。此时，
preempt_thresh相当于一个抢占阈值，只有当pri低于这个阈值时，
才进行抢占。
[2]：对应425-426行。
当没有编译内核选项PREEMPTION时，preempt_thresh的值被
设置为0，此时禁止kernel thread preemption。
***************************************/
425 if (preempt_thresh == 0)
426 return (0);
427 /*
428 * Preempt if we exceed the threshold.
429 */
430 if (pri <= preempt_thresh)
431 return (1);
/****************************************************************
* If we're interactive or better and there is non-interactive
* or worse running preempt only remote processors.
436-437：
#define PRI_MAX_INTERACT 151
436行if语句条件全为真就表示线程td1的交互性程度比线程td2的
交互性程度高，此时需要抢占td2.
***************************************/
436 if (remote && pri <= PRI_MAX_INTERACT && cpri > PRI_MAX_INTERACT)
437 return (1);
/* 如果执行到这里，就表示不需要抢占，函数返回值为0 */
438 return (0);
439 }

复制代码

文库|博客

返回列表

Chinaunix › 论坛 › 操作系统 › BSD › freebsd9.2-ULE线程调度-将线程添加到cpu运行队列中-sch ...

[FreeBSD] freebsd9.2-ULE线程调度-将线程添加到cpu运行队列中-sched_add函数 [复制链接]

浏览过的版块