- 论坛徽章:
- 0
|
本帖最后由 hztj2005 于 2017-07-04 22:16 编辑
Simple XML Validation with Perl(译)
使用Perl 轻易检测XML文档格式合法性
http://www.xml.com/pub/a/2000/11/08/perl
November 8, 2000
Kip Hampton
The Problem: Although XML Schemas and RELAX promise fine-grained validation for XML documents, neither are presently available in the Perl world. You need a way to validate the structure of your documents now. Today. Preferably before lunch.
问题:虽然XML Schemas和RELAX承诺对XML文档进行细致的验证,但两者在Perl世界目前均不存在。 您现在需要一种验证文档结构的方法,那么就是今天, 午餐之前。
The Solution: Combine the simplicity of Test.pm from the standard Perl distribution with the flexibility of XPath.
解决方案:将标准发布Perl版本中的Test.pm的简单性与XPath的灵活性相结合。
Overcoming Test Anxiety
克服测试焦虑
Before we show how Perl can make XML validation simple, we need to take a small detour through the standard Test module.
在我们展示Perl如何使XML验证变得简单之前,我们需要稍微绕一绕,说说标准测试模块Test 。
For those not familiar with it, the Test module was designed to give the harried hacker an easy way to ensure that his or her code passes a series of basic functional test before they unleash it on the world, and, in the case of writing modules, that those same tests are passed on the system on which the code is being installed.
对于不熟悉的人员,测试模块的设计目的是给忙碌的电脑高手一种简单的方式,以确保他或她的代码在世界各地发布之前,通过一系列基本功能测试;而在编写模块的情况下,这些相同的测试能在安装代码的系统上通过。
It is not surprising, then, that using Test.pm is a very straightforward proposition. Each test is defined as call to the function ok(), which takes up to three arguments: a test, an expected return value and an optional message to display upon failure. If the interpolated values of the first two arguments match, the test succeeds. An an example, consider the following two tests:
那么,就不奇怪,使用Test.pm是一个很通常的建议。 每个测试被设定为对ok()函数的调用,它最多可以有三个参数:一个测试,一个预期的返回值,和一个在失败时显示的可选消息。 如果前两个参数插入的值相匹配,则测试成功。 作为例子,考虑以下两个测试:
ok('good','good', 'its all good');
# this test passes because the first two arguments return the same values.
#此测试通过,因为前两个参数返回相同的值。
ok(sub { 2+2 }, 5, '2 and 2 is 4');
# this test fails for the obvious mathematical reason and prints a descriptive error.
由于明显的数学原因,此测试失败,并打印出描述性错误。
Following the XPath
跟着XPath
Now what does Test have to do with validating an XML document? The answer lies in its combination with the XML::XPath module. The XPath language provides a simple, powerful syntax for navigating the logical structure of an XML document. XML::XPath allows us to take advantage of that power from within Perl.
现在,测试Test与验证XML文档有什么关系?答案在于它与XML :: XPath模块的组合。 XPath语言为导航XML文档的逻辑结构,提供了一种简单而强大的语法。 XML :: XPath允许我们利用Perl拥有的这种能力。
XPath's syntax is quite accessible. For example, the XPath expression /foo/bar will find all of the "bar" elements contained within all "foo" elements that are children of the root node (the root element denoted by the leading "/"). Alternately, the expression /foo/bar/* will return the same nodes as the previous example, and bring all of the "bar" elements' descendants along for the ride.
XPath的语法是很容易存取。例如,XPath表达式/foo/bar,将在根节点(“/”表示根元素)的所有子节点“foo”元素中,找到包含于其中的所有“bar”元素。
或者,表达式/ foo / bar / *将返回与上一个示例相同的节点,顺带列出所有“bar”元素的下级子孙节点。
XPath also provides a number of functions and shortcuts that further simplify examining a document's structure. For instance, count(/foo/bar[@name]) will return the number of "bar" elements that have the attribute "name". As we will soon see, combining Test.pm's compact syntax with the simple power of XPath expressions will allow us to tackle the task of validating an XML document simply and efficiently.
XPath还提供了一些功能和快捷方式,可以进一步简化检查文档的结构。例如,count(/ foo / bar [@name])将返回具有“name”属性的“bar”元素的数量。正如我们马上看到的那样,将Test.pm的紧凑语法,与XPath表达式的简单有力相结合,将使我们能够简单有效地解决验证XML文档的任务。
Rolling Our Own XML Validator
运转我们自己的XML验证器
Let's try out what we've covered so far by creating our own simple XML validation tool. To do this, we will need a sample XML file, a test script, and simple Perl "wrapper" script to allow our tool to validate more than a single type of document. We begin with the Perl script, which we will call xml_test.pl. (You can also download the script.)
让我们创建自己简单的XML验证工具,试一试我们迄今为止所包括的内容。 为此,我们将需要一个示例性的XML文件,测试脚本和简单的Perl“包装器”脚本,以使我们的工具能够验证不止一种类型的文档。 我们从Perl脚本开始,我们将其称为xml_test.pl。 (您也可以下载脚本。)
use Test::Harness qw(&runtests $verbose);
use strict;
while(@ARGV > 2) {
my $arg = shift @ARGV;
if ($arg eq '-d') {
$verbose = 1;
}
}
if (@ARGV < 2) {
usage();
exit(0);
}
sub usage {
warn "Usage: xml_test.pl [-d] testscript xmlfile\n";
}
$ENV{XMLFILE} = $ARGV[1];
runtests $ARGV[0];
This script allows us to be more flexible in our testing, by providing a way to specify both the XML file and the test file from the command line. Let's move on to creating a sample XML instance that we intend to validate. (Download the sample file here.)
通过提供从命令行指定XML文件和测试文件的方法,该脚本允许我们在测试中更灵活。 下一步,我们创建一个我们打算验证的示例XML实例。(在此下载示例文件。)
<?xml version="1.0" standalone="yes"?>
<order>
<customer>
<name>Coyote, Ltd.</name>
<shipping_info>
<address>1313 Desert Road</address>
<city>Nowheresville</city>
<state>AZ</state>
<zip>90210</zip>
</shipping_info>
</customer>
<item>
<product id="1111">Acme Rocket Jet Pack</product>
<quantity type="each">1</quantity>
</item>
<item>
<product id="2222">Roadrunner Chow</product>
<quantity type="bag">10</quantity>
</item>
</order>
Now let's consider what tests would be appropriate to validate this type of document. At the very least, we need to verify that the document contains an order, and that the order contains a customer, a shipping address, and a list of items. Beyond that, we should also verify that each item contains a product and a quantity. So, we'll need five tests to verify the basic structure.
现在我们来考虑一下什么样的测试适合验证上面这类的文档。至少,我们需要验证该文档是否包含一个订单,并且该订单包含一个客户,一个送货地址和一个项目列表。 除此之外,我们还应该验证每个项目是否包含产品和数量。 所以,我们需要五个测试来验证基本结构。
Let's create a small test script named order.t (download order.t here). and begin with the basics.
我们来创建一个名为order.t的小测试脚本(这里下载order.t),从基础开始。
use Test;
BEGIN { plan tests => 5 }
use XML::XPath;
my $xp = XML::XPath->new(filename => 'customer_order.xml');
my (@nodes, $test); # pre-declare a few vars
First we'll define a test that checks whether or not the document root is indeed an "order" element. We will do this by attempting to select the nodeset for an "order" element at the document root into an array, testing that the resulting array contains only one element, and then verifying that our test is true.
首先我们定义一个测试,检查文档根部是否确实是一个“order”元素。 我们将通过尝试在文档根部中选取出(select)一个“order”元素的节点集(nodeset),然后测试生成的数组只包含一个元素,然后验证我们的测试是否为真。
@nodes = $xp->findnodes('/order');
OK(@nodes == 1, 1, "the root element must be an 'order'");
Next we need to confirm that our order document contains a "customer" element, and that the "customer" element contains a "shipping_info" element. Rather than running separate tests for each, we can combine these tests into a single expression and, if either element is missing or misplaced, our test will fail.
接下来,我们需要确认我们的订单文档包含一个“customer”元素,“customer”元素包含一个“shipping_info”元素。 我们可以将这些测试结合到单一表达式中,而不是分别运行测试,如果任一元素丢失或错位,我们的测试将失败。
@nodes = $xp->findnodes('/order/customer/shipping_info');
ok(@nodes == 1, 1, "an order must contain a 'customer'
element with a 'shipping_info' child");
As the Perl mantra goes, "There's More Than One Way To Do It", and the same is true with XML::XPath. Rather than selecting the nodes into an array and evaluating that array in a scalar context to get the number of matches, we can use the XPath count() function to achieve the same effect. Note that we will be using XML::XPath's find() function instead of findnodes() since the type of test we are performing returns a literal value instead of a set of document nodes.
正如Perl的口头禅,“有不止一种方法来做某件事”,XML :: XPath也是如此。 我们可以使用XPath count()函数来实现相同的效果,而不是将节点提取到数组中,并在标量上下文中评估该数组以获得符合条件的数目。 请注意,我们将使用XML :: XPath的find()函数而不是前面用到的findnodes(),因此,我们执行的测试类型,返回一个文字值,而不是一组文档节点。
$test = $xp->find('count(/order/item)');
ok($test > 0, 1, "an order must contain at least one 'item' element");
Finally, we need to be sure that every "item" element contains both a "product" element and a "quantity" element. Here we'll get a little fancier and use XPath's boolean() function which returns true (1) only if the entire expression we pass to it evaluates to true. We need only check that the number of "item" elements is equal to the number of "item" elements that have the child elements we are testing for.
最后,我们需要确保每个“item”元素都包含“product”元素和“quantity”元素。 在这里,我们将得到一个小技巧,并使用XPath的boolean()函数,它返回true(1),只有当我们传递给它的整个表达式求值为true。 我们只需要检查“item”元素的数量,等于以我们测试的元素为子元素的“item”元素的数量。
$test = $xp->find( 'boolean(count(/order/item/product) = count(/order/item/))');
ok($test1 == 1, 1, "a 'item' element must contain a an 'product' element.");
$test = $xp->find( 'boolean(count(/order/item/quantity)=count(/order/item))');
ok($test1 == 1, 1, "a 'item' element must contain a 'quantity' element.");
We now have tests to cover the five basic areas that we defined earlier as the most critical in terms of structural validation. Having saved our test file as order.t, let's fire up our xml_test.pl script.
我们现在已经进行了测试,以覆盖我们之前定义的五个基本领域,作为结构验证最重要的准则。 将我们的测试文件保存为order.t后,让我们启动我们的xml_test.pl脚本。
% perl xml_test.pl -d order.t customer_order.xml
order.................1..5
OK 1
OK 2
OK 3
OK 4
OK 5
OK
All tests successful.
所有测试成功。
Files=1, Tests=5, 1 wallclock secs ( 0.69 cusr + 0.07 csys = 0.76 CPU)
Great, our sample document passed muster. But what if it didn't? To find out, open customer_order.xml, remove one of the "quantity" elements, save the file, and run the script again.
很好,我们的样本文件通过检测了。 但怎样才通不过呢? 要找出这种情况,打开customer_order.xml,删除一个“数量”元素,保存文件,然后再次运行该脚本。
order.................1..5
OK 1
OK 2
OK 3
OK 4
not OK 5
# Test 5 got: '' (order.t at line 26)
# Expected: '1' (a 'item' element must contain a 'quantity' element.)
FAILED test 5
Failed 1/5 tests, 80.00% okay
Failed Test Status Wstat
Total Fail Failed List of failed
-------------------------------------------------------------
order.t 5 1 20.00% 5
Failed 1/1 test scripts, 0.00% okay. 1/5 subtests failed, 80.00% okay.
Admittedly the output is not very pretty, but it is functional. We now know that the current document is invalid, and we also know why.
诚然,输出不是非常漂亮,但它有效。 我们现在知道,指定的文件是不符合语法的,我们也知道为什么。
The handful of tests that we currently have clearly would not be sufficient validation for a production environment, but with these few examples, you hopefully have a clear view of the basics and could extend the test script to handle nearly any case. You could, for example, iterate over the "quantity" elements and test each text() node against a regular expression to ensure that each contained only a numeric value. You are limited only by your imagination.
我们目前显而易见的几项测试,对于生产环境来说,不是足够的验证,但是通过这几个例子,您有希望能够清楚地了解基础知识,并可以扩展测试脚本来处理几乎任何情况。 例如,您可以遍历“数量”元素,并用正则表达式测试每个text()节点,以确保每个元素仅包含一个数值。只要你想得到,你就做得到。
The Same Old Scheme?
同样的老方案?
Other Resources
其他资源
• Using XSL as a Validation Language by Rick Jelliffe
•使用XSL作为验证语言,by Rick Jelliffe
• Introducing the Schematron by Uche Ogbuji
•介绍Schematron,by Uche Ogbuji
As much as I would love to take credit for basic ideas presented here, I admit that the notion of using XPath expressions to validate an XML document's structure is not at all new. In fact, this concept is the foundation of Rick Jelliffe's popular Schematron. Thanks should also go to Matt Sergeant, the author of AxKit, for pointing out that Perl's Test and Test::Harness modules would make a nifty environment for a Perl Schematronesque clone. The goal here has been to spark your imagination, to get you to experiment, and, hopefully, to point to the ability of Perl and its modules to make even the more complex XML tasks, like validation, easy to solve.
尽管我足够信任这里提供的基本想法,但我承认,使用XPath表达式验证XML文档结构的概念,并不全然是新的方法。 事实上,这个概念是Rick Jelliffe很流行的Schematron的基础。 也要感谢AxKit的作者Matt Sergeant,他指出,Perl的Test和Test :: Harness模块将为Perl Schematronesque复制品,创造一个漂亮的环境。 这里的目标是激发您的想象力,让您进行实验,并希望指出Perl及其模块,能够使更复杂的XML任务(如验证),易于解决。
|
|