免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2591 | 回复: 6
打印 上一主题 下一主题

[FreeBSD] A Comparison of Solaris, Linux, and FreeBSD Kernels[转] [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2006-03-26 12:40 |只看该作者 |倒序浏览
关于Solaris 10、Linux 2.6、FreeBSD 5.3内核实现的一个简单比较
因为在论坛贴文控制格式很麻烦(抱歉,有点懒),不方便浏览出国的朋友,请拷贝QUOTE里面的代码到本地html文件;或者,找个代理浏览原文

A Comparison of Solaris, Linux, and FreeBSD Kernels

by Max Bruning
October 14, 2005

<h2>A Comparison of Solaris, Linux, and FreeBSD Kernels</h2>
<h3>by Max Bruning</h3>
<p>October 14, 2005</p>
<p>I spend most of my time teaching classes on Solaris internals, device
drivers, and kernel crash dump analysis and debugging. When explaining to
classes how various subsystems are implemented in Solaris, students often ask,
"How does it work in Linux?" or, "In FreeBSD, it works like this, how about
Solaris?"  This article examines three of the basic subsystems of the kernel
and compares implementation between Solaris 10, Linux 2.6, and FreeBSD 5.3.</p>

<p>The three subsystems examined are scheduling, memory management, and file
system architecture. I chose these subsystems because they are common to any
operating system (not just Unix and Unix-like systems), and they tend to be the
most well-understood components of the operating system.</p>

<p>This article does not go into in-depth details on any of the subsystems
described. For that, refer to the source code, various websites, and books on
the subject. For specific books, see:</p>

<ul>

<li><em>Solaris Internals: Core Kernel Architecture</em> by McDougall and Mauro
(visit <a href="http://www.solarisinternals.com/">Solaris Internals</a>)</li>

<li><em>The Design and Implementation of the FreeBSD Operating System</em> by
McKusick and Neville-Neil (visit <a href="http://www.mckusick.com/FreeBSDbook.html">The Design and Implementation
of the FreeBSD Operating System</a>)</li>

<li><em>Linux Kernel Development</em> by Love (visit <a href="http://rlove.org/kernel_book">Linux Kernel Development, 2nd Edition</a>) and
<em>Understanding the Linux Kernel</em> by Bovet and Cesati (visit <a href="http://www.oreilly.com/catalog/linuxkernel2">Understanding the Linux
Kernel, 2nd Edition</a>)</li>

</ul>

<p>If you search the Web for Linux, FreeBSD, and Solaris comparisons, most of
the hits discuss old (in some cases, Solaris 2.5, Linux 2.2, etc.) versions of
the OSes. Many of the "facts" are incorrect for the newest releases, and some
were incorrect for the releases they intended to describe. Of course, most of
them also make value judgments on the merits of the OSes in question, and
there is little information comparing the kernels themselves. The following
sites seem more or less up to date:</p>

<ul>

<li>"<a href="http://www.softpanorama.org/Articles/solaris_vs_linux.shtml">Solaris Vs.
Linux</a>" is pretty one-sided for Solaris 10 over Linux.</li>

<li>"<a href="http://software.newsforge.com/print.pl?sid=04/12/27/1243207">Comparing
MySQL Performance</a>" on Solaris 10, Linux, FreeBSD, and others.</li>

<li>"<a href="http://see.sun.com/Apps/DCS/mcp?q=STI00HTG6Hjxpv">Fast Track to
Solaris 10 Adoption</a>" has some comparisons between Linux and Solaris.</li>

<li>"<a href="http://www.rednova.com/news/technology/135074/solaris_10_heads_for_linux_territory/index.html">Solaris
10 Heads for Linux Territory</a>" is not really a comparison, but reviews
Solaris 10.</li>

</ul>

<p>One of the more interesting aspects of the three OSes is the amount of
similarities between them. Once you get past the different naming conventions,
each OS takes fairly similar paths toward implementing the different concepts.
Each OS supports time-shared scheduling of threads, demand paging with a
not-recently-used page replacement algorithm, and a virtual file system layer
to allow the implementation of different file system architectures. Ideas that
originate in one OS often find their way into others. For instance, Linux also
uses the concepts behind Solaris's <a href="http://www.usenix.org/publications/library/proceedings/bos94/bonwick.html">slab
memory allocator</a>. Much of the terminology seen in the FreeBSD source is
also present in Solaris.  With Sun's move to open source Solaris, I expect to
see much more cross-fertilization of features. Currently, the LXR project
provides a source cross-reference browser for FreeBSD, Linux, and other
Unix-related OSes, available at <a href="http://fxr.watson.org/">fxr.watson.org</a>. It would be great to
see OpenSolaris source added to that site.</p>

<h3>Scheduling and Schedulers</h3>

<p>The basic unit of scheduling in Solaris is the <code>kthread_t</code>; in
FreeBSD, the <code>thread</code>; and in Linux, the <code>task_struct</code>.
Solaris represents each process as a <code>proc_t</code>, and each thread
within the process has a <code>kthread_t</code>. Linux represents processes
(and threads) by <code>task_struct</code> structures. A single-threaded process
in Linux has a single <code>task_struct</code>. A single-threaded process in
Solaris has a <code>proc_t</code>, a single <code>kthread_t</code>, and a

<code>klwp_t</code>. The <code>klwp_t</code> provides a save area for threads
switching between user and kernel modes. A single-threaded process in FreeBSD
has a <code>proc</code> struct, a <code>thread</code> struct, and a
<code>ksegrp</code> struct. The <code>ksegrp</code> is a "kernel scheduling
entity group." Effectively, all three OSes schedule threads, where a thread is
a <code>kthread_t</code> in Solaris, a <code>thread</code> structure in
FreeBSD, and a <code>task_struct</code> in Linux.</p>

<p>Scheduling decisions are based on priority. In Linux and FreeBSD, the lower
the priority value, the better.  This is an inversion; a value closer to 0
represents a higher priority. In Solaris, the higher the value, the higher the
priority.  <a href="#table1">Table 1</a> shows the priority values of the
different OSes.</p>

<a id="table1"></a>
<table border="1">
<caption><em>Table 1. Scheduling Priorities in Solaris, Linux, and
FreeBSD</em></caption>
<tbody>
<tr>
</tr><tr><td rowspan="6"><strong>Solaris</strong></td></tr>
<tr><th>Priorities</th>

<th>Scheduling Class</th>
</tr>
<tr>
<td>0-59</td>
<td>Time Shared, Interactive, Fixed, Fair Share Scheduler</td>
</tr>
<tr>
<td>60-99</td>
<td>System Class</td>
</tr>
<tr>
<td>100-159</td>

<td>Real-Time (note real-time higher than system
threads)</td>
</tr>
<tr>
<td>160-169</td>
<td>Low level Interrupts</td>
</tr>
<tr>
<td rowspan="3"><strong>Linux</strong></td>
<th>Priorities</th>
<th>Scheduling Class</th>
</tr>

<tr>
<td>0-99</td>
<td>System Threads, Real time (<code>SCHED_FIFO</code>,
<code>SCHED_RR</code>)</td>
</tr>
<tr>
<td>100-139</td>
<td>User priorities (<code>SCHED_NORMAL</code>)</td>

</tr>
<tr>
<td rowspan="6"><strong>FreeBSD</strong></td>
<th>Priorities</th>
<th>Scheduling Class</th>
</tr>
<tr>
<td>0-63</td>
<td>Interrupt</td>
</tr>
<tr>
<td>64-127</td>

<td>Top-half Kernel</td>
</tr>
<tr>
<td>128-159</td>
<td>Real-time user (system threads are better
priority)</td>
</tr>
<tr>
<td>160-223</td>
<td>Time-share user</td>
</tr>
<tr>
<td>224-255</td>

<td>Idle user</td>
</tr>
</tbody>
</table>

<p>All three OSes favor interactive threads/processes. Interactive threads run
at better priority than compute-bound threads, but tend to run for shorter time
slices. Solaris, FreeBSD, and Linux all use a per-CPU "runqueue." FreeBSD and
Linux use an "active" queue and an "expired" queue. Threads are scheduled in
priority from the active queue. A thread moves from the active queue to the
expired queue when it uses up its time slice (and possibly at other times to
avoid starvation). When the active queue is empty, the kernel swaps the active
and expired queues. FreeBSD has a third queue for "idle" threads.  Threads run
on this queue only when the other two queues are empty. Solaris uses a
"dispatch queue" per CPU. If a thread uses up its time slice, the kernel gives
it a new priority and returns it to the dispatch queue. The "runqueues" for all
three OSes have separate linked lists of runnable threads for different
priorities.  (Though FreeBSD uses one list per four priorities, both Solaris
and Linux use a separate list for each priority.)</p>

<p>Linux and FreeBSD use an arithmetic calculation based on run time versus
sleep time of a thread (as a measure of "interactive-ness") to arrive at a
priority for the thread. Solaris performs a table lookup. None of the three
OSes support "gang scheduling." Rather than schedule <i>n</i> threads, each OS
schedules, in effect, the next thread to run. All three OSes have mechanisms to
take advantage of caching (warm affinity) and load balancing.  For
hyperthreaded CPUs, FreeBSD has a mechanism to help keep threads on the same
CPU node (though possibly a different hyperthread). Solaris has a similar
mechanism, but it is under control of the user and application, and is not
restricted to hyperthreads (called "processor sets" in Solaris and "processor
groups" in FreeBSD).</p>

<p>One of the big differences between Solaris and the other two OSes is the
capability to support multiple "scheduling classes" on the system at the same
time. All three OSes support Posix <code>SCHED_FIFO</code>,

<code>SCHED_RR</code>, and <code>SCHED_OTHER</code> (or
<code>SCHED_NORMAL</code>). <code>SCHED_FIFO</code> and <code>SCHED_RR</code>
typically result in "realtime" threads. (Note that Solaris and Linux support
kernel preemption in support of realtime threads.) Solaris has support for a
"fixed priority" class, a "system class" for system threads (such as page-out
threads), an "interactive" class used for threads running in a windowing
environment under control of the X server, and the Fair Share Scheduler in
support of resource management. See <code>priocntl(1)</code> for information
about using the classes, as well as an overview of the features of each class.
See <code>FSS(7)</code> for an overview specific to the Fair Share Scheduler.
The scheduler on FreeBSD is chosen at compile time, and on Linux the scheduler
depends on the version of Linux.</p>

<p>The ability to add new scheduling classes to the system comes with a price.
Everywhere in the kernel that a scheduling decision can be made (except
for the actual act of choosing the thread to run) involves an indirect
function call into scheduling class-specific code. For instance, when a thread
is going to sleep, it calls scheduling-class-dependent code that does whatever
is necessary for sleeping in the class. On Linux and FreeBSD, the scheduling
code simply does the needed action. There is no need for an indirect call.  The
extra layer means there is slightly more overhead for scheduling on Solaris
(but more features).</p>

<h3>Memory Management and Paging</h3>

<p>In Solaris, every process has an "address space" made up of logical section
divisions called "segments." The segments of a process address space are
viewable via <code>pmap(1)</code>. Solaris divides the memory management code
and data structures into platform-independent and platform-specific parts. The
platform-specific portions of memory management is in the HAT, or hardware
address translation, layer. FreeBSD describes its process address space by a
vmspace, divided into logical sections called regions. Hardware-dependent
portions are in the "pmap" (physical map) module and "vmap" routines handle
hardware-independent portions and data structures. Linux uses a memory
descriptor to divides the process address space into logical sections called
"memory areas" to describe process address space. Linux also has a
<code>pmap</code> command to examine process address space.</p>

<p>Linux divides machine-dependent layers from machine-independent layers at a much
higher level in the software. On Solaris and FreeBSD, much of the code dealing
with, for instance, page fault handling is machine-independent. On Linux, the
code to handle page faults is pretty much machine-dependent from the beginning
of the fault handling. A consequence of this is that Linux can handle much of
the paging code more quickly because there is less data abstraction (layering)
in the code. However, the cost is that a change in the underlying hardware or
model requires more changes to the code. Solaris and FreeBSD isolate such
changes to the HAT and pmap layers respectively.</p>

<p>Segments, regions, and memory areas are delimited by:</p>

<ul>

<li>Virtual address of the start of the area.</li>

<li>Their location within an object/file that the segment/region/memory area
maps.</li>

<li>Permissions.</li>

<li>Size of the mapping.</li>

</ul>

<p>For instance, the text of a program is in a segment/region/memory area. The
mechanisms in the three OSes to manage address spaces are very similar, but the
names of data structures are completely different. Again, more of the Linux
code is machine-dependent than is true of the other two OSes.</p>

<h4>Paging</h4>

<p>All three operating systems use a variation of a least recently used
algorithm for page stealing/replacement. All three have a daemon process/thread
to do page replacement. On FreeBSD, the <code>vm_pageout</code> daemon wakes up
periodically and when free memory becomes low. When available memory goes below
some thresholds, <code>vm_pageout</code> runs a routine
(<code>vm_pageout_scan</code>) to scan memory to try to free some pages. The

<code>vm_pageout_scan</code> routine may need to write modified pages
asynchronously to disk before freeing them. There is one of these daemons
regardless of number of CPUs.  Solaris also has a <code>pageout</code> daemon
that also runs periodically and in response to low-free-memory situations.
Paging thresholds in Solaris are automatically calibrated at system startup so
that the daemon does not overuse the CPU or flood the disk with page-out
requests.  The FreeBSD daemon uses values that, for the most part, are
hard-coded or tunable in order to determine paging thresholds. Linux also uses
an LRU algorithm that is dynamically tuned while it runs. On Linux, there can
be multiple <code>kswapd</code> daemons, as many as one per CPU. All three OSes
use a global working set policy (as opposed to per process working set).</p>

<p>FreeBSD has several page lists for keeping track of recently used pages.
These track "active," "inactive," "cached," and "free" pages. Pages move
between these linked lists depending on their uses. Frequently accessed pages
will tend to stay on the active list. Data pages of a process that exits can be
immediately placed on the free list. FreeBSD may swap entire processes out if
<code>vm_pageout_scan</code> cannot keep up with load (for example, if the
system is low on memory). If the memory shortage is severe enough,
<code>vm_pageout_scan</code> will kill the largest process on the system.</p>

<p>Linux also uses different linked lists of pages to facilitate an LRU-style
algorithm. Linux divides physical memory into (possibly multiple sets of) three
"zones:" one for DMA pages, one for normal pages, and one for dynamically
allocated memory. These zones seem to be very much an implementation detail
caused by x86 architectural constraints. Pages move between "hot," "cold," and
"free" lists. Movement between the lists is very similar to the mechanism on
FreeBSD. Frequently accessed pages will be on the "hot" list. Free pages will
be on the "cold" or "free" list.</p>

<p>Solaris uses a free list, hashed list, and vnode page list to maintain its
variation of an LRU replacement algorithm. Instead of scanning the vnode or
hash page lists (more or less the equivalent of the "active"/"hot" lists in the
FreeBSD/Linux implementations), Solaris scans all pages uses a "two-handed
clock" algorithm as described in <em>Solaris Internals</em> and elsewhere. The
two hands stay a fixed distance apart. The front hand ages the page by clearing
reference bit(s) for the page. If no process has referenced the page since the
front hand visited the page, the back hand will free the page (first
asynchronously writing the page to disk if it is modified).</p>

<p>All three operating systems take NUMA locality into account during paging.
The I/O buffer cache and the virtual memory page cache is merged into one
system page cache on all three OSes. The system page cache is used for
reads/writes of files as well as <code>mmap</code>ped files and text and data
of applications.</p>


未完,转4楼

[ 本帖最后由 yuzlei 于 2006-3-27 16:37 编辑 ]

论坛徽章:
0
2 [报告]
发表于 2006-03-27 10:01 |只看该作者
楼主,能不能把全文给贴过来呀?

论坛徽章:
0
3 [报告]
发表于 2006-03-27 15:29 |只看该作者
贴过来嘛,我们这上网管制啊。

论坛徽章:
0
4 [报告]
发表于 2006-03-27 16:28 |只看该作者
接1楼
<h3>File Systems</h3>

<p>All three operating systems use a data abstraction layer to hide file system
implementation details from applications. In all three OSes, you use
<code>open</code>, <code>close</code>, <code>read</code>, <code>write</code>,
<code>stat</code>, etc. system calls to access files, regardless of the
underlying implementation and organization of file data. Solaris and FreeBSD
call this mechanism VFS ("virtual file system") and the principle data structure
is the <code>vnode</code>, or "virtual node." Every file being accessed in
Solaris or FreeBSD has a <code>vnode</code> assigned to it. In addition to
generic file information, the <code>vnode</code> contains pointers to
file-system-specific information. Linux also uses a similar mechanism, also
called VFS (for "virtual file switch"). In Linux, the file-system-independent
data structure is an <code>inode</code>. This structure is similar to the

<code>vnode</code> on Solaris/FreeBSD.  (Note that there is an
<code>inode</code> structure in Solaris/FreeBSD, but this is
file-system-dependent data for UFS file systems). Linux has two
different structures, one
for file operations and the other for inode operations. Solaris and
FreeBSD
combine these as "vnode operations."</p>

<p>VFS allows the implementation of many file system types on the
system. This means that there is no reason that one of these
operating systems could not access the file systems of the other
OSes. Of course, this requires the relevant file system routines
and data structures to be ported to the VFS of the OS in question.
All three OSes allow the stacking of file systems.  
<a href="#fs_table">Table 2</a> lists file system types
implemented in each OS, but it does not show <em>all</em> file system types.

<a id="fs_table"></a></p>
<table border="1">

<caption><em>Table 2. Partial List of File System Types</em></caption>
<tbody>
<tr>
<td rowspan="9"><strong>Solaris</strong></td>
<td>ufs</td>
<td>Default local file system (based on BSD Fast Filesystem)</td>
</tr>
<tr>
<td>nfs</td>
<td>Remote Files</td>
</tr>

<tr>
<td>proc</td>
<td><em>/proc</em> files; see <code>proc(4)</code></td>
</tr>
<tr>
<td>namefs</td>
<td>Name file system; allows opening of doors/streams as files</td>
</tr>
<tr>
<td>ctfs</td>

<td>Contract file system used with Service Management Facility</td>
</tr>
<tr>
<td>tmpfs</td>
<td>Uses anonymous space (memory/swap) for temporary files</td>
</tr>
<tr>
<td>swapfs</td>
<td>Keeps track of anonymous space (data, heap, stack, etc.)</td>
</tr>
<tr>
<td>objfs</td>

<td>Keeps track of kernel modules, see <code>objfs(7FS)</code></td>
</tr>
<tr>
<td>devfs</td>
<td>Keeps track of <em>/devices</em> files; see <code>devfs(7FS)</code></td>
</tr>
<tr>
<td rowspan="8"><strong>FreeBSD</strong></td>

<td>ufs</td>
<td>Default local file system (ufs2, based on BSD Fast Filesystem)</td>
</tr>
<tr>
<td>defvs</td>
<td>Keeps track of <em>/dev</em> files</td>
</tr>
<tr>
<td>ext2</td>

<td>Linux ext2 file system (GNU-based)</td>
</tr>
<tr>
<td>nfs</td>
<td>Remote files</td>
</tr>
<tr>
<td>ntfs</td>
<td>Windows NT file system</td>
</tr>
<tr>
<td>smbfs</td>

<td>Samba file system</td>
</tr>
<tr>
<td>portalfs</td>
<td>Mount a process onto a directory</td>
</tr>
<tr>
<td>kernfs</td>
<td>Files containing various system information</td>
</tr>
<tr>
<td rowspan="7"><strong>Linux</strong></td>

<td>ext3</td>
<td>Journaling, extent-based file system from ext2</td>
</tr>
<tr>
<td>ext2</td>
<td>Extent-based file system</td>
</tr>
<tr>
<td>afs</td>
<td>AFS client support for remote file sharing</td>
</tr>

<tr>
<td>nfs</td>
<td>Remote files</td>
</tr>
<tr>
<td>coda</td>
<td>Another networked file system</td>
</tr>
<tr>
<td>procfs</td>
<td>Processes, processors, buses, platform specifics</td>

</tr>
<tr>
<td>reiserfs</td>
<td>Journaling file system</td>
</tr>
</tbody>
</table>

<h3>Conclusions</h3>

<p>Solaris, FreeBSD, and Linux are obviously benefiting from each other. With
Solaris going open source, I expect this to continue at a faster rate. My
impression is that change is most rapid in Linux.  The benefits of this are
that new technology has a quick incorporation into the system. Unfortunately,
the documentation (and possibly some robustness) sometimes lags behind. Linux
has many developers, and sometimes it shows. FreeBSD has been around (in some
sense) the longest of the three systems. Solaris has its basis in a
combination of BSD Unix and AT&amp;T Bell Labs Unix.  Solaris uses more data
abstraction layering, and generally could support additional features quite
easily because of this.  However, most of the layering in the kernel is
undocumented.  Probably, source code access will change this.</p>

<p>A brief example to highlight differences is page fault handling.  In
Solaris, when a page fault occurs, the code starts in a platform-specific trap
handler, then calls a generic <code>as_fault()</code> routine. This routine
determines the segment where the fault occurred and calls a "segment driver" to
handle the fault. The segment driver calls into file system code. The file
system code calls into the device driver to bring in the page. When the page-in
is complete, the segment driver calls the HAT layer to update page table
entries (or their equivalent). On Linux, when a page fault occurs, the kernel
calls the code to handle the fault. You are immediately into platform-specific
code. This means the fault handling code can be quicker in Linux, but the Linux
code may not be as easily extensible or ported.</p>

<p>Kernel visibility and debugging tools are critical to get a correct
understanding of system behavior. Yes, you can read the source code, but I
maintain that you can easily misread the code. Having tools available to test
your hypothesis about how the code works is invaluable. In this respect, I see
Solaris with <code>kmdb</code>, <code>mdb</code>, and DTrace as a clear winner. I have been "reverse
engineering" Solaris for years. I find that I can usually answer a question by
using the tools faster than I can answer the same question by reading source
code. With Linux, I don't have as much choice for this. FreeBSD allows use of
<code>gdb</code> on kernel crash dumps. <code>gdb</code> can set breakpoints,
single step, and examine and modify data and code. On Linux, this is also
possible once you download and install the tools.</p>

<p><em>Max Bruning currently teaches and consults on Solaris internals, device
drivers, kernel (as well as application) crash analysis and debugging,
networking internals, and specialized topics.  Contact him at
max@bruningsystems.com or http://mbruning.blogspot.com/.</em></p>

论坛徽章:
0
5 [报告]
发表于 2006-03-27 17:11 |只看该作者
我们这是公司里管得严。
只能流览指定网址:如www.netbsd.org等。
其他网址除非申请通过。
不要造成误解哈,我可不想成名人.呵。

论坛徽章:
0
6 [报告]
发表于 2006-03-27 17:50 |只看该作者
原帖由 ljoolj 于 2006-3-27 17:11 发表
我们这是公司里管得严。
只能流览指定网址:如www.netbsd.org等。
其他网址除非申请通过。
不要造成误解哈,我可不想成名人.呵。

这个没想到,以为是教育网的兄弟,

论坛徽章:
0
7 [报告]
发表于 2006-03-27 19:28 |只看该作者
原帖由 yuzlei 于 2006-3-26 12:40 发表
关于Solaris 10、Linux 2.6、FreeBSD 5.3内核实现的一个简单比较
因为在论坛贴文控制格式很麻烦(抱歉,有点懒),不方便浏览出国的朋友,请拷贝QUOTE里面的代码到本地html文件;或者,找个代理浏览原文
...


多谢分享!已加到置顶的资料汇总贴中。
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP