免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: haoji
打印 上一主题 下一主题

The Art of Unix Programming [复制链接]

论坛徽章:
0
441 [报告]
发表于 2008-05-18 05:23 |只看该作者
troff(1) has many other requests, but you are unlikely to see most of them directly. Very few documents are
written in bare troff. It supports a macro facility, and there are half a dozen macros in more or less general
use. Of these, the overwhelmingly most common is the 鈥榤an鈥

论坛徽章:
0
442 [报告]
发表于 2008-05-18 05:24 |只看该作者
Nevertheless, at time of writing man pages remain the single most important form of Unix documentation.

TeX

TeX (pronounced /teH/ with a rough h as though you are gargling) is a very capable typesetting program
which (like the Emacs editor) originated outside the Unix culture but is now thoroughly naturalized in it. It
was created by noted computer scientist Donald Knuth when he became impatient with the quality of
typography (and especially mathematical typesetting) that was available to him in the late 1970s.

TeX, like troff(1), is a markup-centered system. TeX's request language is rather more powerful than troff's;
among other things, it is better at handling images, page-positioning content precisely, and
internationalization. TeX is particularly good at mathematical typesetting, and unsurpassed at basic
typesetting tasks like kerning, line filling, and hyphenating. TeX has become the standard submission format
for most mathematical journals, and is actually now maintained as open source by a working group of the the
American Mathematical Society. It is also commonly used for scientific papers.

As with troff(1), human beings usually do not write large volumes of raw TeX macros by hand; they use
macro packages and various auxiliary programs instead. One particular macro package, LaTeX, is almost
universal, and most people who say they're composing in TeX almost always actually mean they're writing
LaTeX. Like troff's macro packages, a lot of its requests are semi-structural.

One important use of TeX that is normally hidden from the user is that other document-processing tools
often generate LaTeX to be turned into PostScript, rather than attempting the much more difficult job of
generating PostScript themselves. The xmlto(1) front end that we discussed as a shell-programming case
study in Chapter 12 (Languages) uses this tactic; so does the XML-DocBook toolchain we'll examine later in
this chapter.

TeX has a wider application range than troff(1) and is in most ways a better design. It has the same
fundamental problems as troff(1) in an increasingly Web-centric word; its markup has strong ties to the
presentation level, and automatically generating good web pages from TeX sources is difficult and fault-
prone.

TeX is never used for Unix system documentation and only very rarely used for application documentation;
for those purposes, troff(1) is sufficient. But some software packages that originated in academia outside the
Unix community have imported the use of TeX as a documentation master format; the Python language is
one example. As we noted above, it is also heavily used for mathematical and scientific papers, and will
probably dominate that niche for some years yet.

Texinfo

Texinfo is a documentation markup invented by the Free Software Foundation and used mainly for GNU
project documentation 鈥

论坛徽章:
0
443 [报告]
发表于 2008-05-18 05:24 |只看该作者
Texinfo was the first markup system specifically designed to support both typeset output on paper and
hypertext output for browsing. The hypertext format was not, however, HTML; it was a more primitive
variety called 鈥榠nfo鈥

论坛徽章:
0
444 [报告]
发表于 2008-05-18 05:25 |只看该作者
The present chaos and a possible way out

Prev Chapter 16. Documentation

Next



The present chaos and a possible way out

Unix documentation is, at present, a mess.

Between man, ms, mm, TeX, Texinfo, POD, HTML, and DocBook, the documentation
master files on modern Unix systems are scattered across eight different markup formats.
There is no uniform way to view all the rendered versions, they aren't web-accessible, and
they aren't cross-indexed.

Many people in the Unix community are aware that this is a problem. At time of writing
most of the effort towards solving it has come from open-source developers, who are more
actively interested in competing for acceptance by non-technical end-users than developers
for proprietary Unixes have been. Since 2000, practice has been moving towards use of
XML-DocBook as a documentation interchange format (conversion from the older SGML-
DocBook is trivial).

The goal, which is within sight but will take a lot of effort to achieve, is to equip every Unix
system with software that will act as a system-wide document registry. When system
administrators install packages, one step will be to enter the package's XML-DocBook
documentation into the registry. It will then be rendered into a common HTML document
tree and cross-linked to the documentation already present.

Early versions of the document-registry software are already working. The problem of
forward-converting documentation in all seven formats into XML-DocBook is a large and
messy one, but the conversion tools are falling into place. Other political and technical
problems remain to be attacked, but are probably soluble.

While there is not as of early 2003 a community-wide consensus that the older formats have
to be phased out, that seems the likeliest working out of events.

Accordingly, we'll next take a very detailed look at DocBook and its toolchain. This
description should be read as an introduction to XML under Unix, a pragmatic guide to
practice and as a major case study. It's a good example of how, in the context of the Unix
community, cooperation between different project groups develops around shared standards.

论坛徽章:
0
445 [报告]
发表于 2008-05-18 05:26 |只看该作者
Prev Up Next

The zoo of Unix documentation
formats

Home DocBook

论坛徽章:
0
446 [报告]
发表于 2008-05-18 05:26 |只看该作者
DocBook

Prev Chapter 16. Documentation

Next



DocBook

A great many major open-source projects are converging on DocBook as a standard format
for their documentation. The advocates of XML-based structural markup seem to have won
the theoretical argument, and an effective XML-DocBook toolchain is available in open
source.

Nevertheless, a lot of confusion still surrounds DocBook and the programs that support it. Its
devotees speak an argot that is dense and forbidding even by computer-science standards,
slinging around acronyms that have no obvious relationship to the things you need to do to
write markup and make HTML or PostScript from it. XML standards and technical papers are
notoriously obscure.

Document Type Definitions

(Note: to keep the explanation simple, most of this section is going to tell some lies, mainly
by omitting a lot of history. Truthfulness will be fully restored in a following section.)

DocBook is a structural-level markup language. Specifically, it is a dialect of XML. A
DocBook document is a piece of XML that uses XML tags for structural markup.

In order for a document formatter to apply a stylesheet to your document and make it look
good, it needs to know things about the overall structure of your document. For example, it
needs to know that a book manuscript normally consists of front matter, a sequence of
chapters, and back matter in order to physically format chapter headers properly. In order for
it to know this sort of thing, you need to give it a Document Type Definition or DTD. The
DTD tells your formatter what sorts of elements can be in the document structure, and in what
orders they can appear.

What we mean by calling DocBook an 鈥榓pplication鈥

论坛徽章:
0
447 [报告]
发表于 2008-05-18 05:27 |只看该作者
DocBook formatter). This program checks your document against the DocBook DTD to make
sure you aren't breaking any of the DTD's structural rules (otherwise the back end of the
formatter, the part that applies your style sheet, might become quite confused)

The validating parser will either error out, giving you messages about places where the
document structure is broken, or translate the document into a stream of XML elements and
text which the parser back end combines with the information in your stylesheet to produce
formatted output

See Figure 16.1 is a diagram of the whole process:

Figure 16.1. Processing structural documents



The part of the diagram inside the dotted box is your formatting software, or toolchain.
Besides the obvious and visible input to the formatter (the document source) you'll need to
keep the two hidden inputs of the formatter (DTD and stylesheet) in mind to understand what
follows.

Other DTDs

A brief digression into other DTDs may help make clear what parts of the previous section
were specific to DocBook and what parts are general to all structural-markup languages.

TEI (Text Encoding Initiative) is a large, elaborate DTD used primarily in academia for
computer transcription of literary texts. TEI's Unix-based toolchains use many of the same
tools that are involved with DocBook, but with different stylesheets and (of course) a
different DTD.

XHTML, the latest version of HTML, is also an XML application described by a DTD, which
explains the family resemblance between XHTML and DocBook tags. The XHTML
toolchain consists of web browsers and a number of ad-hoc HTML-to-print utilities.

Many other XML DTDs are maintained to help people exchange structured information in
fields as diverse as bioinformatics and banking. You can look at a list of repositories to get
some idea of the variety available.

论坛徽章:
0
448 [报告]
发表于 2008-05-18 05:27 |只看该作者
The DocBook toolchain

Normally, what you'll do to make XHTML from your DocBook sources is use the xmlto(1)
front end. Your commands will look like this:

bash$ xmlto xhtml foo.xmlbash$ ls *.htmlar01s02.html ar01s03.html ar01s04.html index.html

In this example, you converted an XML-Docbook document named foo.xml with three top-
level sections into an index page and two parts. Making one big page is just as easy:

bash$ xmlto xhtml-nochunks foo.xmlbash$ ls *.htmlfoo.html

Finally, here is how you make PostScript for printing:

bash$ xmlto ps foo.xml # To make PostScriptbash$ ls *.psfoo.ps

To turn your documents into HTML or PostScript, you need an engine that can apply the
combination of DocBook DTD and a suitable stylesheet to your document. See Figure 16.2
how the open-source tools for doing this fit together:

Figure 16.2. Present-day XML-DocBook toolchain



Parsing your document and applying the stylesheet transformation will be handled by one of
three programs. The most likely one is xsltproc, the parser that ships with Red Hat Linux. The
other possibilities are two Java programs, Saxon and Xalan.

It is relatively easy to generate high-quality XHTML from either DocBook; the fact that
XHTML is simply another XML DTD helps a lot. Translation to HTML is done by applying
a rather simple stylesheet, and that's the end of the story. RTF is also simple to generate in
this way, and from XHTML or RTF it's easy to generate a flat ASCII text approximation in a

论坛徽章:
0
449 [报告]
发表于 2008-05-18 05:28 |只看该作者
pinch.

The awkward case is print. Generating high-quality printed output (which means, in practice,
Adobe's PDF (Portable Document Format) is difficult. Doing it right requires algorithmically
duplicating the delicate judgments of a human typesetter moving from content to presentation
level.

So, first, a stylesheet translates Docbook's structural markup into another dialect of XML 鈥

论坛徽章:
0
450 [报告]
发表于 2008-05-18 05:28 |只看该作者
environment). As of early 2003 xsl-fo-proc is in an unfinished alpha state, not as far along as
FOP.

Migration tools

The second biggest problem with DocBook is the effort needed to convert old-style
presentation markup to DocBook markup. Human beings can usually parse the presentation
of a document into logical structure automatically, because (for example) they can tell from
context when an italic font means 鈥榚mphasis鈥
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP