- 论坛徽章:
- 0
|
找到一票老文,立此存照
原文:http://www.itworld.com/Comp/2375/swol-0901-insidesolaris/
The kernel dispatcher and associated subsystems provide for the prioritization and scheduling of kernel threads in one of several bundled scheduling classes. The details of the implementation are covered in a series of past Inside Solaris columns which began in October 1998.
On this topic
ITworld.com Today. Sign up Now!
ITworld.com Product Spotlight. Sign up Now!
>
Solaris currently ships with two threads libraries: libthread.so, for support of the Solaris threads interfaces, and libpthread.so, the POSIX (Portable Operating System Interface for Unix) threads APIs. User threads are created by a call to either thr_create(3THR) (Solaris threads) or pthread_create(3THR) (POSIX threads). The Solaris threads library was originally introduced in Solaris 2.2. At the time, the POSIX threads specification had not been completed. When the POSIX draft was ready, an implementation of the POSIX threads library was developed and began shipping in Solaris 2.6. Both libraries continue to be bundled with Solaris, but we recommend that any new development use the POSIX interfaces, as new features and functionality are being integrated into the POSIX code but not necessarily into the Solaris threads library.
User threads do not have a notion of scheduling classes, such as the timeshare and realtime classes implemented in the kernel. POSIX threads do provide the notion of several scheduling policies, which can be specified by the programmer as part of a thread's attributes. Attributes, which were introduced by POSIX, allow the programmer to alter the behavior of a user thread or synchronization object, and can be found in both. The creation of POSIX threads and synchronization objects permits the passing of an attribute's structure, which must be initialized along with any changes to specific attributes prior to the create call in order to create the thread or synchronization object.
As of Solaris 8, the supported attributes for a thread are:
contentionscope: PTHREAD_SCOPE_PROCESS or PTHREAD_SCOPE_SYSTEM. Determines if the thread is bound or unbound (more on this below).
detachstate: Determines whether or not to save the thread's state when it terminates, so that it is joinable. That is, another thread in the same process can issue a pthread_join() on the thread ID and collect the thread's exit status.
stackaddr: User-specified thread stack address. By default, the system will determine the stack address based on existing address-space mappings.
stacksize: User-specified stack size. Default is 1 MB for a 32-bit process, 2 MB for a 64-bit process.
priority: A user-specified priority. Default is zero.
policy: The scheduling policy. Default is SCHED_OTHER, meaning that Solaris will provide fixed priority behavior.
guardsize: Specifies protection against stack overflow by placing a guard page (red zone) around the mapped stack pages.
inheritsched: Default value of PTHREAD_INHERIT_SCHED allows new threads to inherit the scheduling policy of the calling thread.
Each of these attributes has a corresponding pair of POSIX APIs for reading (get) and altering (set) a specific attribute. For example, determining or changing the stack size attribute is accomplished using pthread_attr_setstacksize (3THR) and pthread_attr_getstacksize (3THR). The programmer cannot alter or read an attribute through simple structure assignments in code; the appropriate attribute API must be used. As we move through the discussion, we'll talk more about the attributes (priority and policy) directly related to the subject at hand.
The user thread is abstracted as a data structure in the address space of the process that issued the thread create call. A data structure is allocated and initialized for each user thread. The programmer can specify certain thread attributes, such as the thread stack size, stack address, and priority, when the thread is created. The library code will perform validity checks on the passed arguments before allowing the thread create to complete. Note that specifying a stack address, stack size, and priority is optional, and with null values the system will provide defaults. The default thread priority is zero, and the default stack size is 1 MB for a 32-bit process and 2 MB for a 64-bit process. Using Solaris threads, the stack address and stack size are arguments in the thr_create (3THR) call.
For POSIX threads, an attribute's structure can be initialized and have a nondefault values set for stack address, stack size, scheduling policy, and priority. For each of the attributes available, there is a corresponding pair of POSIX APIs to set and get the particular attribute. For example, determining or changing the stack size attribute is accomplished using pthread_attr_setstacksize (3THR) and pthread_attr_getstacksize (3THR). A pointer to the attribute's structure is passed as an argument to the pthread_create (3THR) call.
The fields in the thread structure get populated during the thread create, with the stack pointer and size, thread priority, scheduling policy, and various other fields set prior to the thread executing for the first time. We'll go through the relevant fields as we talk about thread scheduling, priorities, and state changes.
Three factors affect the scheduling of user threads:
The thread's contention scope
The thread's priority
The scheduling policy attribute
The contentionscope attribute can be either process (intraprocess) or system (interprocess). System contentionscope (interprocess) describes a thread bound to an underlying LWP (lightweight process). Bound threads are created by setting the contentionscope attribute to PTHREAD_SCOPE_SYSTEM for POSIX threads or the THR_BOUND flag for Solaris threads. The default for both is to create an unbound thread (PTHREAD_SCOPE_PROCESS attribute for POSIX). A bound thread has an LWP created during the thread create processing, and the user thread is bound (linked) to the created LWP for the thread's lifetime. For bound threads, any thread's library-level priorities or scheduling policies are immaterial. A bound thread always has the execution resource it needs (an LWP) for scheduling by the kernel. Altering the priority of a bound thread involves use of the priocntl(1) command (or of the corresponding system call to do so programmatically) and affects the thread's priority as viewed by the kernel dispatcher.
The user thread's priority and scheduling policy factor into the scheduling of threads within a process (intraprocess) contentionscope (the default). We can view the scheduling of these unbound threads in two phases. First, they must be scheduled from within the library. A thread is scheduled when it's linked to an available LWP; this is the first phase. The second phase involves the kernel dispatcher scheduling the LWP and its associated kernel thread (to which the user thread has been linked by the threads library) onto an available processor.
A dispatch queue of all runnable user threads is maintained at the library level. The dispatch queue in releases up to and including Solaris 8 is an array of dispq structures, with each structure member containing a pointer to the first and last threads on the list. Each array element corresponds to a user thread priority, and threads at the same priority are maintained on a linked list and rooted in the array element that corresponds to the priority. This is shown in the figure below.
Fig 1. Threads library dispatcher queue
There are 128 user thread priorities (0 through 127). As we mentioned, the default user thread priority is 0. Higher priorities are better priorities (as is the case with kernel global priorities), and threads with higher priorities will be scheduled before those with lower priorities. The scheduling of user threads by the library routines involves finding the highest-priority runnable thread and linking it to an available LWP from the pool.
The programmer can provide hints to the library as to the level of concurrency desired, using either thr_setconcurrency (3THR) (Solaris threads) or pthread_setconcurrency (3THR) (POSIX threads). Both calls resolve to the same internal library _thr_setconcurrency() function. Both APIs take an integer value as an argument, which translates internally to the number of LWPs desired by the process for user thread execution. The _thr_setconcurrency() code does validity tests on the passed concurrency value and computes the difference between the desired concurrency and the current number of LWPs in the pool. An internal library variable, _nlwps, maintains a count of LWPs during the execution lifetime of the process. _thr_setconcurrency() then creates additional LWPs based on the computed difference. For instance, if there are three LWPs in the pool and a pthread_setconcurrency(5) call is made (desired concurrency level is five), two additional LWPs will be created.
User-thread scheduling is done in the library through internal library interfaces called at various points in time during the execution of the process -- or, more precisely, by user threads executing within the process. Specifically, the user threads scheduler will be entered when:
A thread blocks in a system call or a library call for a synchronization object (e.g., a mutex lock)
A thread terminates
A thread explicitly yields a CPU (thr_yield(3THR))
A thread is preempted by a higher (better) priority thread becoming runnable
A compute-bound thread that does not enter the kernel via a system call or yield the processor will execute until it completes, never surrendering the LWP it's been linked to when first scheduled by the library. This is an important consideration when developing multithreaded applications and understanding the level of concurrency and execution time for user threads.
Yielding the processor is a voluntary action using the thr_yield(3THR) interface. The internal library code will simply yield the processor if there are no runnable threads on the dispatch queue. Otherwise, several library internal functions are called to remove the thread that issued the yield call from the linked list of ONPROC threads. ONPROC is the thread's state when it's on a processor, and all ONPROC threads are maintained on a linked list in the library. The yielding thread is then placed on the internal dispatch (run) queue, and the library swtch() code is called to find the runnable thread with the highest priority and schedule it.
When a user thread issues a system call or calls one of the library interfaces to acquire a synchronization object, the thread may need to block if the desired synchronization object is not available. Remember, it's being held by another thread. In either case, the thread is temporarily bound to the LWP while blocked. For a system call, the thread enters the kernel and the kernel will put the LWP on a sleep queue while it is blocked, waiting for the system call to complete. The kernel handles the wakeup mechanism when the system call is completed so that the LWP can resume execution.
For a library-level blocking on a synchronization object, the thread is placed on a library-level sleep queue, and its state is changed from ONPROC to SLEEP. During the thread swtch code, the highest-priority runnable thread is located on the internal dispatch queue, and the LWP is passed to the newly-selected thread for execution. Since we're blocking on a synchronization primitive in user land, with no visibility in the kernel, there's no reason to have the LWP block in the kernel. The LWP can be, and is, made available to execute another user thread.
Finally, the thread's library supports thread preemption. When a thread with a higher priority than the current list of threads in the ONPROC state (on-a-processor state, scheduled from the threads library perspective) becomes runnable, a preemption will force a lower-priority thread off the LWP so the higher-priority thread can get the execution resource it needs.
That's a wrap. Next month, we'll continue our discussion of the threads library with a closer look at the internal functions and algorithms for user thread scheduling.
Jim Mauro is an area technology manager for Sun Microsystems in the Northeast, focusing on server systems, clusters, and high availability. He has 18 years of industry experience, working in educational services (he developed and delivered courses on Unix internals and administration) and software consulting. |
|