XML::LibXML - An XML::Parser Alternative
XML::LibXML —— XML解析器的另一个选择
November 14, 2001
Kip Hampton
Introduction
介绍
The vast majority of Perl's XML modules are built on top of XML::Parser, Larry Wall and Clark Cooper's Perl interface to James Clark's expat parser. The expat-XML::Parser combination is not the only full-featured XML parser available in the Perl World. This month we'll look at XML::LibXML, Matt Sergeant and Christian Glahn's Perl interface to Daniel Velliard's libxml2.
Why Would You Want Yet Another XML Parser?
为什么需要另一个XML解析器?
Expat and XML::Parser have proven themselves to be quite capable, but they are not without limitations. Expat was among the first XML parsers available and, as a result, its interfaces reflect the expectations of users at the time it was written. Expat and XML::Parser do not implement the Document Object Model, SAX, or XPath language interfaces (things that most modern XML users take for granted) because either the given interface did not exist or was still being heavily evaluated and not considered "standard" at the time it was written.
Expat和XML :: Parser已经被证明是相当有能力的,但并不是没有局限。 Expat是最早可用的XML解析器之一,因此,其接口反映了那个时期使用者的期望。 Expat和XML :: Parser没有实现文档对象模型DOM、SAX,乃至XPath语言的接口(这些是大多数当代XML用户认为理所应当存在的),或者是因为给定的接口编写Expat和XML :: Parser时还不存在,或者因为当时接口还有待严格测试,而不被视为“标准”。
The somewhat unfortunate result of this is that most of the available Perl XML modules are built upon one of XML::Parser's non- or not-quite-standard interfaces with the presumption that the input will be some sort of textual representation of an XML document (file, filehandle, string, socket stream) that must be parsed before proceeding. While this works for many simple cases, most advanced XML applications need to do more than one thing with a given document and that means that for each stage in the process, the document must be serialized to a string and then re-parsed by the next module.
这造成有点不幸的结果: 大多数可用的Perl XML模块都是基于XML :: Parser的“非标准接口”或“不是十分标准接口”构建的,假定输入的是XML文档的某种文本表示(文件,文件句柄,字符串,套接字流),对这些文本进行处理之前必须先解析。
虽然对于许多简单的情况不存在问题,但大多数高级XML应用场合,需要对特定文档执行多次操作,这意味着处理中的每个阶段,文档必须序列化为字符串,然后被下一个模块再次解析。
By contrast libxml2 was written after the DOM, XPath, and SAX interfaces became common, and so it implements all three. In-memory trees can be built by parsing documents stored in files, strings, and so on, or generated from a series of SAX events. Those trees can then be operated on using the W3C DOM and XPath interfaces or used to generate SAX events that are handed off to external event handlers. This added flexibility, which reflects current XML processing expectations, makes XML::LibXML a strong contender for XML::Parser's throne.
相比之下,libxml2是在DOM、XPath以及SAX接口变得普遍之后编写的,因此它实现了所有三种接口。内存中的树,可以通过解析文档形式的文件来构建,也可以从字符串构建,或者从一系列SAX事件生成。然后可以使用W3C DOM和XPath接口来操作这些树,或者用于生成SAX事件,再将这些事件交付给外部处理程序。这增加了灵活性,反映了当前的XML处理期望,使XML :: LibXML成为霸主XML :: Parser的强大竞争者。
Using XML::LibXML
使用XML :: LibXML
This month's column may be seen as a addendum to the Perl/XML Quickstart Guide published earlier this year, when XML::LibXML was in its infancy, and we'll use the same tests from the Quickstart to put XML::LibXML though its paces.
For a detailed overview of the test cases see the first installment in the Quickstart; but, to summarize, the two tests illustrate how to extract and print data from an XML document, and how to build and print, programmatically, an XML document from data stored in a Perl HASH using the facilities offered by a given XML module.
本月的专栏可以视为今年早些时候发布的“Perl / XML快速入门指南”的增编,当时XML :: LibXML处于起步阶段,我们将使用与Quickstart相同的测试,使XML :: LibXML经历相同的步骤。
有关测试用例的详细概述,请参阅Quickstart中的第一部分; 但是总而言之,这两个测试说明了如何从XML文档中提取和打印数据,以及如何使用给定的XML模块提供的功能,从存储在Perl HASH中的数据,以程序化的方式构建和打印出一个XML文档。
Reading
读xml文件
For accessing the data stored in XML documents, XML::LibXML provides a standard W3C DOM interface. Documents are treated as a tree of nodes and the data those nodes contain are accessed by calling methods on the node objects themselves.
为了访问存储在XML文档中的数据,XML :: LibXML提供了一个标准的W3C DOM接口。 文档被处理为一个节点树,这些节点包含的数据,可以通过调用节点对象本身的方法进行访问。
use strict;
use XML::LibXML;
my $file = 'files/camelids.xml';
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($file);
my $root = $tree->getDocumentElement;
my @species = $root->getElementsByTagName('species');
foreach my $camelid (@species) {
my $latin_name = $camelid->getAttribute('name');
my @name_node = $camelid->getElementsByTagName('common-name');
my $common_name = $name_node[0]->getFirstChild->getData;
my @c_node = $camelid->getElementsByTagName('conservation');
my $status = $c_node[0]->getAttribute('status');
print "$common_name ($latin_name) $status \n";
}
One of the more exciting features of XML::LibXML is that, in addition to the DOM interface, it allows you to select nodes using the XPath language. The following illustrates how to achieve the same effect as the previous example using XPath to select the desired nodes:
XML :: LibXML更令人激动的功能之一是,除了DOM接口之外,它还允许您使用XPath语言选择节点。 以下说明如何使用XPath选择所需节点,达到与上一个例子相同的效果:
use strict;
use XML::LibXML;
my $file = 'files/camelids.xml';
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($file);
my $root = $tree->getDocumentElement;
foreach my $camelid ($root->findnodes('species')) {
my $latin_name = $camelid->findvalue('@name');
my $common_name = $camelid->findvalue('common-name');
my $status = $camelid->findvalue('conservation/@status');
print "$common_name ($latin_name) $status \n";
}
What makes this exciting is that you can you can mix and match methods from the DOM and XPath interfaces to best suit the needs of your application, while operating on the same tree of nodes.
令人兴奋的是,您可以混合和搭配使用DOM和XPath接口中的方法,操纵同一棵树上的节点,以最佳地满足应用程序的需求。
Writing
写xml文件
To create an XML document programmatically with XML::LibXML you simply use the provided DOM interface:
要使用XML :: LibXML以编程方式创建XML文档,只需使用提供的DOM接口:
my $text = XML::LibXML::Text->new($camelid_links{$item}->{description});
$link->appendChild($text);
$body->appendChild($link);
}
print $doc->toString;
An important difference between XML::LibXML and XML::DOM is that libxml2's object model conforms to the W3C DOM Level 2 interface, which is better able to cope with documents containing XML Namespaces. So, where XML::DOM is limited to:
XML :: LibXML和XML :: DOM之间的一个重要区别是,libxml2的对象模型符合W3C DOM 标准2接口,它能够更好地处理包含XML命名空间的文档。 所以,XML :: DOM限于:
@nodeset = getElementsByTagName($element_name);
and
$node = $doc->createElement($element_name);
XML::LibXML also provides:
XML :: LibXML还提供:
@nodeset = getElementsByTagNameNS($namespace_uri, $element_name);
and
$node = $doc->createElementNS($namespace_uri, $element_name);
The Joy of SAX
SAX的可喜之处
Also in Perl and XML
Perl和XML其它方面
OSCON 2002 Perl and XML Review
XSH, An XML Editing Shell
PDF Presentations Using AxPoint
Multi-Interface Web Services Made Easy
Perl and XML on the Command Line
We've seen the DOM and XPath goodness that XML::LibXML provides, but the story does not end there. The libxml2 library also offers a SAX interface that can be used to create DOM trees from SAX events or generate SAX events from DOM trees.
我们已经看到XML :: LibXML提供的DOM和XPath的好处,但故事并没有结束。 libxml2库还提供了一个SAX接口,可用于从SAX事件创建DOM树,或从DOM树生成SAX事件。
The following creates a DOM tree programmatically from a SAX driver built on XML::SAX::Base. In this example, the initial SAX events are generated from a custom driver implemented in the CamelDriver class that calls the handler events in the XML::LibXML::SAX::Builder class to build the DOM tree.
下面例子,是基于XML :: SAX :: Base,以编程方式,从SAX驱动程序,创建一个DOM树。 在这个例子中,最初的SAX事件,是从在CamelDriver类中实现的客户驱动程序生成的,它调用XML :: LibXML :: SAX :: Builder类中的事件处理程序来构建DOM树。
use XML::LibXML;
use XML::LibXML::SAX::Builder;
my $builder = XML::LibXML::SAX::Builder->new();
my $driver = CamelDriver->new(Handler => $builder);
You can also generate SAX events from an existing DOM tree using XML::LibXML::SAX::Generator. In the following snippet, the DOM tree created by parsing the file camelids.xml is handed to XML::LibXML::SAX::Generator's generate() method which in turn calls the event handlers in XML::Handler::XMLWriter to print the document to STDOUT.
您还可以使用XML :: LibXML :: SAX :: Generator,从现有的DOM树生成SAX事件。 在以下代码段中,通过解析文件camelids.xml创建的DOM树,该树被交给XML :: LibXML :: SAX :: Generator的generate()方法,该方法又调用XML :: Handler :: XMLWriter中的事件处理程序来打印 文件到STDOUT。
use strict;
use XML::LibXML;
use XML::LibXML::SAX::Generator;
use XML::Handler::XMLWriter;
my $file = 'files/camelids.xml';
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my $handler = XML::Handler::XMLWriter->new();
my $driver = XML::LibXML::SAX::Generator->new(Handler => $handler);
# generate SAX events that are captured
# by a SAX Handler or Filter.
$driver->generate($doc);
Resources
资 源
Download the sample code.
Perl XML Quickstart: The Standard XML Interfaces
Writing SAX Drivers for Non-XML Data
Transforming Data With SAX Filters
This ability to accept and emit SAX events is especially useful in light of the recent discussion in this column of generating SAX events from non-XML data and writing SAX filter chains.
You could, for example, use a SAX driver written in Perl to emit events based on data returned from a database query that creates a DOM object, which is then transformed in C-space for display using XSLT and the mind-numbingly fast libxslt library (which expects libxml2 DOM objects), and then emit SAX events from that transformed DOM tree for further processing using custom SAX filters to provide the finishing touches -- all without once having had to serialize the document to a string for re-parsing. Wow.
联系到上文讨论的“从非XML数据生成SAX事件,到写SAX过滤器”这一系列过程,接受和发出SAX事件的能力就特别有用。
例如,您可以使用Perl编写的SAX驱动程序,根据数据库查询返回的数据,创建DOM对象,从DOM对象发出事件,然后将其转换到C空间,以使用XSLT和快得令人惊叹的libxslt库 (它需要使用libxml2 DOM对象)来展示数据,然后从该转换的DOM树中发出SAX事件,以便使用客户定义SAX过滤器进行进一步处理,以提供最终结果,而无需一次将文档序列化为字符串进行重新解析。 哇!
Conclusions
结论
As we have seen, XML::LibXML offers a fast, updated approach to XML processing that may be superior to the first-generation XML::Parser for many cases. Do not misunderstand, XML::Parser and its dependents are still quite useful, well-supported, and are not likely to go away any time soon. But it is not the only game in town, and given the added flexibility that XML::LibXML provides, I would strongly encourage you to give XML::LibXML a closer look before beginning your next Perl/XML project.
如我们所见,XML :: LibXML提供了一种快速的新的XML处理方法,在许多情况下可能优于第一代XML :: Parser。 不要误会,XML :: Parser及其派生模块依然非常有用,得到很好的支持,不会很快消失。 但它并不是唯一的游戏,鉴于XML :: LibXML提供了更多的灵活性,我强烈建议您在开始下一个Perl / XML项目之前,尽可能地认真的研究一下XML :: LibXML。
学习如何在您的 UNIX® 应用程序中使用 XML(可扩展标记语言)。本文面向那些不熟悉 XML 的 UNIX 开发人员,研究了 Gnome 项目中开发的 XML 库。在从总体上对 XML 进行简单的解释之后,您将看到 UNIX 应用程序开发人员可能用来解析和管理 XML 格式的配置文件的示例代码,其中使用了 LibXML2 库。
在给出 XML 的简单定义之后,本文介绍了一个使用 XML 编写的示例配置文件。然后,通过示例代码来介绍如何解析这个配置文件。系统管理员可以手动修改该配置文件,但通常在一定程度上,需要应用程序直接地修改该配置文件。然后,本文通过一个示例介绍如何以编程的方式向这个 XML 文档添加新的配置选项,以及如何修改当前条目的值。最后,本文介绍了将这个经过修改的配置文件写入到磁盘的代码。
关于 XML
在开始研究 LibXML2 库之前,让我们先来巩固一下 XML 的相关基础。XML 是一种基于文本的格式,它可用来创建能够通过各种语言和平台访问的结构化数据。它包括一系列类似 HTML 的标记,并以树型结构来对这些标记进行排列。
例如,可参见清单 1 中介绍的简单文档。这是配置文件部分中研究的配置文件示例的简化版本。为了更清楚地显示 XML 的一般概念,所以对其进行了简化。
清单 1 中的第一行是 XML 声明,它告诉负责处理 XML 的应用程序,即解析器,将要处理的 XML 的版本。大部分的文件使用版本 1.0 编写,但也有少量的版本 1.1 的文件。它还定义了所使用的编码。大部分文件使用 UTF-8,但是,XML 设计用来集成各种语言中的数据,包括那些不使用英语字母的语言。
在操作阶段中,该程序对 XML 文档中的元素进行添加、修改和删除的数据更新操作。通常地,这将按照用户的操作来进行。
最后在导出阶段中,将经过修改的最终的文档写回到磁盘。
加载和解析数据
对于应用程序来说,读取 XML 文件的第一步是加载该数据并将其解析为一个 Document 对象。在此基础上,可以对 DOM 树进行遍历以获取特定的节点。让我们来看看清单 4 中的代码是如何完成该任务的。
清单 4. 加载和解析 example.xml 的代码
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file("example.xml");
$filesystem = $doc->getDocumentElement();
@nodes=$filesystem->childNodes;
foreach $node (@nodes) {
if($node->nodeType==ELEMENT_NODE) { # ignore text nodes
# just get the first match
@dirnames = $node->getElementsByTagName("dirname")->item(0);
foreach $dirname (@dirnames) {
print "dirname: " . $dirname->textContent . "\n";
# push this into an array
}
# get all children
@files = $node->getChildrenByTagName("files");
foreach $file (@files) {
foreach $values ($file->childNodes) {
# ignore text nodes
if($values->nodeType!=XML_TEXT_NODE) {
if($values->nodeName() eq "age") {
# check for attribute, otherwise, use default of 'hours'
if($values->hasAttributes()) {
print $values->nodeName() . ": " . $values->textContent;
print " " . $values->attributes->item(0)->value();
print "\n";
} else {
print $values->nodeName() . ": " . $values->textContent;
print " hours\n";
}
# calculate extended value from units and put in a
# hash linked with this dirname, etc.
} else {
print $values->nodeName() . ": " . $values->textContent;
print "\n";
# put this value into a hash linked with $dirname.
# We may have multiple entries for each $dirname, so
# perhaps use an array within a hash
}
}
}
}
}
}
首先,在清单 4 中,创建了解析器并将 XML 从文件加载到 XML::LibXML::Document 变量。这个对象包含了整个 XML 树,并且具有与之关联的各种方法可用来搜索节点、导出、验证和创建新的节点。本文将在后面的几个部分中对其中的一些方法进行介绍。从代码的起始处开始,您可以看到 getDocumentElement() 方法,它用于返回文档的根节点。从这个根节点,就可以遍历整个 XML 树。
正如前面所解释的,清单 2 中的示例 XML 代码是一种可读的格式。换行和缩进使得文档更容易阅读。XML 解析器将读取所有这些字符,并将其作为一个 TEXT 类型的节点。清单 5 中的示例没有添加任何这样的 TEXT 节点。因此,该示例的输出将不包含任何换行或缩进。如果您希望创建这种空白字符,那么需要使用 XML::LibXML::Text 类来创建 TEXT 类型的节点,或者使用该文档对象的 createTextNode() 函数。该构造函数的返回值是一个节点,可以使用与上面示例中相同的方式将其添加到树中。
在一些编程语言中,将 XML 文档保存到文件中可能比较烦琐,但幸运的是,LibXML 让这项任务变得非常简单:
$doc->toFile("example.xml");
在对数据进行的所有的操作中,这是最简单的一种操作。在对内存中的 XML 文档完成了相应的修改之后,只需使用一个函数调用就可以将其写回到对应的配置文件中。还可以使用相关的 Perl 函数,如 toString() 和 toFH(),这些函数分别将 XML 输出到一个字符串变量或者一个已打开的 Perl 文件句柄,而文件句柄将为您的应用程序的构建带来更大的灵活性。
结束语
通过提供 LibXML2 库以及对 Perl 模块的支持,Gnome 项目完成了一项很有价值的任务。本文对管理和使用 XML 配置文件所需要的三个重要的步骤进行了介绍。解析阶段可能是最复杂的,因为它需要一定程度的递归设计来解析 XML 树。尽管有些烦琐,但对内存中 XML 文档的操作却是非常简单明了的。使用 LibXML2 库导出经过修改的配置,也是非常容易的。
尽管相对于标准的UNIX 思维方式来说,需要进行一定的思维模式转移,但是 XML 可以为数据管理提供一种功能强大的方法。与简单的数据库格式相比,树型结构提供了更加灵活的数据视图。在开发新的应用程序或修改旧的应用程序时,可将其规范化为使用 XML 配置文件,在进行规范化的过程中可以很容易地使用 Gnome 项目所提供的免费的标准库,正如本文所介绍的。
The Perl programming language has a wealth of support for XML. Kip Hampton's column is the essential companion for Perl developers working with XML.
Perl语言对XML有着大量的支持。 Kip Hampton的专栏是Perl开发人员使用XML的重要伙伴。
OSCON 2002 Perl and XML Review
O'Reilly 开源会议 2002年会Perl-XML总结
By Kip Hampton
In this month's Perl and XML column, Kip Hampton reviews the state of the Perl-XML world as displayed at O'Reilly's Open Source Convention. [Aug. 21, 2002]
在本月的Perl和XML专栏中,Kip Hampton回顾了 “O'Reilly 开源会议” 所呈现的Perl-XML世界的状态。
OSCON2002指O'Reilly's Open Source Convention 2002,即“O'Reilly 开源会议 2002年会”。
XSH, An XML Editing Shell
XSH,一个XML编辑shell
By Kip Hampton
In this month's Perl and XML column, Kip Hampton introduces XSH, an XML editing shell, which Kip suggests should become a part of your XML tool kit. [Jul. 10, 2002]
在本月的Perl和XML专栏中,Kip Hampton介绍了XSH,一个XML编辑shell(指运行XSH后,在类似命令行的状态中处理XML),Kip建议,XSH应该成为XML工具包的一部分。
An Shell,按惯例指操作系统外围的人机交互的命令行界面,比如PERL中的安装模块cpan就有点象一个Shell,我们执行cpan后,就出现cpan>,提示我们输入其它命令。
PDF Presentations Using AxPoint
By Kip Hampton
In this month's Perl and XML column, Kip Hampton describes AxPoint, a way to create presentations in PDF using Perl and XML. [Jun. 19, 2002]
使用AxPoint制作XML的PDF格式演示文稿
在本月的Perl和XML专栏中,Kip Hampton描述了AxPoint,这是一种使用Perl和XML在PDF中创建演示文稿的方法。
Multi-Interface Web Services Made Easy
多种方法,Web服务可轻松实现
By Kip Hampton
This month's Perl and XML column offers a range of methods for easily building web applications with SOAP, REST, and XML-RPC interfaces. [May. 8, 2002]
本月的Perl和XML列提供了一系列方法,使用SOAP,REST和XML-RPC接口,轻松构建的Web应用程序。
Perl and XML on the Command Line
命令行上的Perl和XML
By Kip Hampton
In this month's Perl and XML column, Kip Hampton explores how the desperate Perl hacker can use its XML tools on the command line. [Apr. 17, 2002]
在本月的Perl和XML专栏中,Kip Hampton探讨了急切的Perl黑客如何在命令行中使用其XML工具。
Introducing XML::SAX::Machines, Part Two
介绍XML :: SAX :: Machines模块,第二部分
By Kip Hampton
This month, Kip Hampton's introduction to Perl's XML::SAX::Machines tool continues, adding flexibility to Apache-based apps and demonstrating the construction of a SAX controller. [Mar. 20, 2002]
本月,Kip Hampton继续介绍Perl的XML :: SAX :: Machines工具,为基于Apache的应用程序,增添了灵活性,并展示了SAX控制器的构建方法。
Introducing XML::SAX::Machines, Part One
介绍XML :: SAX :: Machines模块,第一部分
By Kip Hampton
XML::SAX::Machines offers an elegant way of building and managing complex chains of SAX event handlers and generators. Kip Hampton introduces this helpful module. [Feb. 13, 2002]
XML :: SAX :: Machines提供了的优雅方式,用于构建和管理复杂连串SAX事件的处理程序和生成程序。 Kip Hampton介绍了这个有用的模块。
Web Content Validation with XML::Schematron
使用XML :: Schematron进行Web内容验证
By Kip Hampton
Kip Hampton explains how to use his XML::Schematron module to validate XML Web content with Perl. [Jan. 23, 2002]
Kip Hampton解释了如何在Perl中,使用他的XML :: Schematron模块来验证XML Web内容。
XML and Modern CGI Applications
By Kip Hampton
Kip Hampton explores a modern CGI module, CGI::XMLApplication, which uses XML and XSLT to separate logic and presentation cleanly. [Dec. 12, 2001]
XML::LibXML - An XML::Parser Alternative
XML :: LibXML - XML :: Parser Alternative
By Kip Hampton
Kip Hampton discusses XML::LibXML, a capable, updated alternative to Perl's venerable and venerated XML::Parser. [Nov. 14, 2001]
Kip Hampton讨论了XML :: LibXML,这是Perl中一直被看重的XML :: Parser的一个有效的、更新的替代品。
Transforming XML With SAX Filters
使用SAX过滤器转换XML
By Kip Hampton
Kip Hampton concludes his series of advanced SAX topics by showing how to use SAX filters to transform XML. [Oct. 10, 2001]
Kip Hampton通过展示如何使用SAX筛选器转换XML,来结束他的一系列高级SAX主题。
Writing SAX Drivers for Non-XML Data
为非XML数据编写SAX驱动程序
By Kip Hampton
Kip Hampton shows us how to write drivers to produce SAX events and, thus, XML documents from non-XML data sources. [Sep. 19, 2001]
Kip Hampton向我们展示了如何编写驱动程序来生成SAX事件,从而为非XML数据源提供XML文档。
Creating VoiceXML Applications With Perl
使用Perl创建VoiceXML应用程序
By Kip Hampton
Kip Hampton shows you how to use VoiceXML and Perl to connect the telephone to the Web. [Aug. 9, 2001]
Kip Hampton向您展示,如何使用VoiceXML和Perl将电话连接到Web。
Creating Scalable Vector Graphics with Perl
用Perl创建可缩放矢量图形
By Kip Hampton
Kip Hampton demonstrates how to use Perl, XML, and SVG to generate useful and attractive graphics dynamically. [Jul. 11, 2001]
Kip Hampton演示了如何使用Perl,XML和SVG,动态生成有用和有吸引力的图形。
Perl XML Quickstart: Convenience Modules
Perl XML快速入门:便利模块
By Kip Hampton
The third and final part of our guide to Perl XML modules covers some handy modules geared to specific tasks. [Jun. 13, 2001]
Perl XML模块指南的第三部分和最后一部分,涵盖了一些适用于特定任务的方便的模块。[六月 13,2001]
Perl XML Quickstart: The Standard XML Interfaces
Perl XML快速入门:标准XML接口
By Kip Hampton
In the second part of our guide to XML and Perl, we cover the Perl implementations of the standard XML APIs DOM, SAX, and XPath. [May. 16, 2001]
在我们的XML和Perl指南的第二部分,我们介绍了标准XML 应用程序 DOM,SAX和XPath的Perl实现。
Perl XML Quickstart: The Perl XML Interfaces
Perl XML快速入门:Perl XML接口
By Kip Hampton
This first installment of our guide to Perl and XML covers Perl-specific interfaces for reading and writing XML. [Apr. 18, 2001]
Perl和XML指南的第一部分,包含了用于读取和写入XML的Perl特定接口。
Using XML::Twig
使用XML :: Twig
By Kip Hampton
XML::Twig provides a fast, memory-efficient way to handle large XML documents, which is useful when the needs of your application make using the SAX interface overly complex. [Mar. 21, 2001]
XML :: Twig提供了一种快速,高效利用内存的处理大型XML文档的方式,当您的应用程序,使用SAX界面过于复杂时,这很有用。[三月 21,2001]
High-Performance XML Parsing With SAX
使用SAX高性能解析XML
By Kip Hampton
Manipulating XML documents in Perl using DOM or XPath can hit a performance barrier with large documents -- the answer is to use SAX. [Feb. 14, 2001]
使用DOM或XPath在Perl中处理XML文档,可能会遇到大型文档的性能障碍 – 解决方法是使用SAX。
Creating Web Utilities Using XML::XPath
使用XML :: XPath创建Web Utilities
By Kip Hampton
Using XML on your web site means more than just valid XHTML: our monthly Perl and XML column explores some possibilities for the automation of an all-XML web site. [Jan. 10, 2001]
在您的网站上使用XML意味着,不仅仅是判断XHTML是否合乎语法:本月Perl和XML专栏,探索了一个全XML网站自动化的一些可能性。
Using XML and Relational Databases with Perl
使用XML和关系数据库与Perl
By Kip Hampton
This article explores how Perl can be used to transfer data between XML and relational databases, and how XML can bridge two disparate databases. [Dec. 13, 2000]
本文探讨了Perl如何用于在XML和关系数据库之间传输数据,以及如何利用XML在两个不同种类的数据库中架起桥梁。
Simple XML Validation with Perl
Perl简单的XML验证
By Kip Hampton
A combination of Perl and XPath can provide a quick, lightweight solution for validating documents. Find out how in the first installment of our new monthly Perl and XML column. [Nov. 8, 2000]
Perl和XPath的组合,可以为验证XML文档,提供一个快速,轻便的解决方案。
具体方法请阅读我们最新的月度Perl和XML专栏的第一部分。作者: hztj2005 时间: 2017-07-03 17:52 本帖最后由 hztj2005 于 2017-07-03 17:54 编辑
The Problem: Although XML Schemas and RELAX promise fine-grained validation for XML documents, neither are presently available in the Perl world. You need a way to validate the structure of your documents now. Today. Preferably before lunch.
问题:虽然XML Schemas和RELAX承诺对XML文档进行细致的验证,但两者在Perl世界目前均不存在。 您现在需要一种验证文档结构的方法,那么就是今天, 午餐之前。
The Solution: Combine the simplicity of Test.pm from the standard Perl distribution with the flexibility of XPath.
解决方案:将标准发布Perl版本中的Test.pm的简单性与XPath的灵活性相结合。
Overcoming Test Anxiety
克服测试焦虑
Before we show how Perl can make XML validation simple, we need to take a small detour through the standard Test module.
在我们展示Perl如何使XML验证变得简单之前,我们需要稍微绕一绕,说说标准测试模块Test 。
For those not familiar with it, the Test module was designed to give the harried hacker an easy way to ensure that his or her code passes a series of basic functional test before they unleash it on the world, and, in the case of writing modules, that those same tests are passed on the system on which the code is being installed.
对于不熟悉的人员,测试模块的设计目的是给忙碌的电脑高手一种简单的方式,以确保他或她的代码在世界各地发布之前,通过一系列基本功能测试;而在编写模块的情况下,这些相同的测试能在安装代码的系统上通过。
It is not surprising, then, that using Test.pm is a very straightforward proposition. Each test is defined as call to the function ok(), which takes up to three arguments: a test, an expected return value and an optional message to display upon failure. If the interpolated values of the first two arguments match, the test succeeds. An an example, consider the following two tests:
# this test passes because the first two arguments return the same values.
#此测试通过,因为前两个参数返回相同的值。
ok(sub { 2+2 }, 5, '2 and 2 is 4');
# this test fails for the obvious mathematical reason and prints a descriptive error.
由于明显的数学原因,此测试失败,并打印出描述性错误。
Following the XPath
跟着XPath
Now what does Test have to do with validating an XML document? The answer lies in its combination with the XML::XPath module. The XPath language provides a simple, powerful syntax for navigating the logical structure of an XML document. XML::XPath allows us to take advantage of that power from within Perl.
现在,测试Test与验证XML文档有什么关系?答案在于它与XML :: XPath模块的组合。 XPath语言为导航XML文档的逻辑结构,提供了一种简单而强大的语法。 XML :: XPath允许我们利用Perl拥有的这种能力。
XPath's syntax is quite accessible. For example, the XPath expression /foo/bar will find all of the "bar" elements contained within all "foo" elements that are children of the root node (the root element denoted by the leading "/"). Alternately, the expression /foo/bar/* will return the same nodes as the previous example, and bring all of the "bar" elements' descendants along for the ride.
XPath的语法是很容易存取。例如,XPath表达式/foo/bar,将在根节点(“/”表示根元素)的所有子节点“foo”元素中,找到包含于其中的所有“bar”元素。
或者,表达式/ foo / bar / *将返回与上一个示例相同的节点,顺带列出所有“bar”元素的下级子孙节点。
XPath also provides a number of functions and shortcuts that further simplify examining a document's structure. For instance, count(/foo/bar[@name]) will return the number of "bar" elements that have the attribute "name". As we will soon see, combining Test.pm's compact syntax with the simple power of XPath expressions will allow us to tackle the task of validating an XML document simply and efficiently.
XPath还提供了一些功能和快捷方式,可以进一步简化检查文档的结构。例如,count(/ foo / bar [@name])将返回具有“name”属性的“bar”元素的数量。正如我们马上看到的那样,将Test.pm的紧凑语法,与XPath表达式的简单有力相结合,将使我们能够简单有效地解决验证XML文档的任务。
Rolling Our Own XML Validator
运转我们自己的XML验证器
Let's try out what we've covered so far by creating our own simple XML validation tool. To do this, we will need a sample XML file, a test script, and simple Perl "wrapper" script to allow our tool to validate more than a single type of document. We begin with the Perl script, which we will call xml_test.pl. (You can also download the script.)
让我们创建自己简单的XML验证工具,试一试我们迄今为止所包括的内容。 为此,我们将需要一个示例性的XML文件,测试脚本和简单的Perl“包装器”脚本,以使我们的工具能够验证不止一种类型的文档。 我们从Perl脚本开始,我们将其称为xml_test.pl。 (您也可以下载脚本。)
This script allows us to be more flexible in our testing, by providing a way to specify both the XML file and the test file from the command line. Let's move on to creating a sample XML instance that we intend to validate. (Download the sample file here.)
通过提供从命令行指定XML文件和测试文件的方法,该脚本允许我们在测试中更灵活。 下一步,我们创建一个我们打算验证的示例XML实例。(在此下载示例文件。)
<?xml version="1.0" standalone="yes"?>
<order>
<customer>
<name>Coyote, Ltd.</name>
<shipping_info>
<address>1313 Desert Road</address>
<city>Nowheresville</city>
<state>AZ</state>
<zip>90210</zip>
</shipping_info>
</customer>
<item>
<product id="1111">Acme Rocket Jet Pack</product>
<quantity type="each">1</quantity>
</item>
<item>
<product id="2222">Roadrunner Chow</product>
<quantity type="bag">10</quantity>
</item>
</order>
Now let's consider what tests would be appropriate to validate this type of document. At the very least, we need to verify that the document contains an order, and that the order contains a customer, a shipping address, and a list of items. Beyond that, we should also verify that each item contains a product and a quantity. So, we'll need five tests to verify the basic structure.
现在我们来考虑一下什么样的测试适合验证上面这类的文档。至少,我们需要验证该文档是否包含一个订单,并且该订单包含一个客户,一个送货地址和一个项目列表。 除此之外,我们还应该验证每个项目是否包含产品和数量。 所以,我们需要五个测试来验证基本结构。
Let's create a small test script named order.t (download order.t here). and begin with the basics.
我们来创建一个名为order.t的小测试脚本(这里下载order.t),从基础开始。
use Test;
BEGIN { plan tests => 5 }
use XML::XPath;
my $xp = XML::XPath->new(filename => 'customer_order.xml');
my (@nodes, $test); # pre-declare a few vars
First we'll define a test that checks whether or not the document root is indeed an "order" element. We will do this by attempting to select the nodeset for an "order" element at the document root into an array, testing that the resulting array contains only one element, and then verifying that our test is true.
首先我们定义一个测试,检查文档根部是否确实是一个“order”元素。 我们将通过尝试在文档根部中选取出(select)一个“order”元素的节点集(nodeset),然后测试生成的数组只包含一个元素,然后验证我们的测试是否为真。
@nodes = $xp->findnodes('/order');
OK(@nodes == 1, 1, "the root element must be an 'order'");
Next we need to confirm that our order document contains a "customer" element, and that the "customer" element contains a "shipping_info" element. Rather than running separate tests for each, we can combine these tests into a single expression and, if either element is missing or misplaced, our test will fail.
接下来,我们需要确认我们的订单文档包含一个“customer”元素,“customer”元素包含一个“shipping_info”元素。 我们可以将这些测试结合到单一表达式中,而不是分别运行测试,如果任一元素丢失或错位,我们的测试将失败。
ok(@nodes == 1, 1, "an order must contain a 'customer'
element with a 'shipping_info' child");
As the Perl mantra goes, "There's More Than One Way To Do It", and the same is true with XML::XPath. Rather than selecting the nodes into an array and evaluating that array in a scalar context to get the number of matches, we can use the XPath count() function to achieve the same effect. Note that we will be using XML::XPath's find() function instead of findnodes() since the type of test we are performing returns a literal value instead of a set of document nodes.
正如Perl的口头禅,“有不止一种方法来做某件事”,XML :: XPath也是如此。 我们可以使用XPath count()函数来实现相同的效果,而不是将节点提取到数组中,并在标量上下文中评估该数组以获得符合条件的数目。 请注意,我们将使用XML :: XPath的find()函数而不是前面用到的findnodes(),因此,我们执行的测试类型,返回一个文字值,而不是一组文档节点。
$test = $xp->find('count(/order/item)');
ok($test > 0, 1, "an order must contain at least one 'item' element");
Finally, we need to be sure that every "item" element contains both a "product" element and a "quantity" element. Here we'll get a little fancier and use XPath's boolean() function which returns true (1) only if the entire expression we pass to it evaluates to true. We need only check that the number of "item" elements is equal to the number of "item" elements that have the child elements we are testing for.
最后,我们需要确保每个“item”元素都包含“product”元素和“quantity”元素。 在这里,我们将得到一个小技巧,并使用XPath的boolean()函数,它返回true(1),只有当我们传递给它的整个表达式求值为true。 我们只需要检查“item”元素的数量,等于以我们测试的元素为子元素的“item”元素的数量。
ok($test1 == 1, 1, "a 'item' element must contain a 'quantity' element.");
We now have tests to cover the five basic areas that we defined earlier as the most critical in terms of structural validation. Having saved our test file as order.t, let's fire up our xml_test.pl script.
我们现在已经进行了测试,以覆盖我们之前定义的五个基本领域,作为结构验证最重要的准则。 将我们的测试文件保存为order.t后,让我们启动我们的xml_test.pl脚本。
Great, our sample document passed muster. But what if it didn't? To find out, open customer_order.xml, remove one of the "quantity" elements, save the file, and run the script again.
Admittedly the output is not very pretty, but it is functional. We now know that the current document is invalid, and we also know why.
诚然,输出不是非常漂亮,但它有效。 我们现在知道,指定的文件是不符合语法的,我们也知道为什么。
The handful of tests that we currently have clearly would not be sufficient validation for a production environment, but with these few examples, you hopefully have a clear view of the basics and could extend the test script to handle nearly any case. You could, for example, iterate over the "quantity" elements and test each text() node against a regular expression to ensure that each contained only a numeric value. You are limited only by your imagination.
我们目前显而易见的几项测试,对于生产环境来说,不是足够的验证,但是通过这几个例子,您有希望能够清楚地了解基础知识,并可以扩展测试脚本来处理几乎任何情况。 例如,您可以遍历“数量”元素,并用正则表达式测试每个text()节点,以确保每个元素仅包含一个数值。只要你想得到,你就做得到。
The Same Old Scheme?
同样的老方案?
Other Resources
其他资源
• Using XSL as a Validation Language by Rick Jelliffe
•使用XSL作为验证语言,by Rick Jelliffe
• Introducing the Schematron by Uche Ogbuji
•介绍Schematron,by Uche Ogbuji
As much as I would love to take credit for basic ideas presented here, I admit that the notion of using XPath expressions to validate an XML document's structure is not at all new. In fact, this concept is the foundation of Rick Jelliffe's popular Schematron. Thanks should also go to Matt Sergeant, the author of AxKit, for pointing out that Perl's Test and Test::Harness modules would make a nifty environment for a Perl Schematronesque clone. The goal here has been to spark your imagination, to get you to experiment, and, hopefully, to point to the ability of Perl and its modules to make even the more complex XML tasks, like validation, easy to solve.
尽管我足够信任这里提供的基本想法,但我承认,使用XPath表达式验证XML文档结构的概念,并不全然是新的方法。 事实上,这个概念是Rick Jelliffe很流行的Schematron的基础。 也要感谢AxKit的作者Matt Sergeant,他指出,Perl的Test和Test :: Harness模块将为Perl Schematronesque复制品,创造一个漂亮的环境。 这里的目标是激发您的想象力,让您进行实验,并希望指出Perl及其模块,能够使更复杂的XML任务(如验证),易于解决。