- 论坛徽章:
- 0
|
XPath and Default Namespace handling
A lot of questions about XPath expressions not returning the expected results seem to be related
to the (ab)use of Namespaces and mostly by so-called "Default Namespaces". This article will try
to explain the problem and provides solutions using 3 popular XPath implementations: Jaxen, the
JAXP XPathFactory and XSLT.
Contents
What's the Problem?
Let's assume the following XML:
catalog>
cd>
artist>Sufjan Stevensartist>
title>Illinoistitle>
src>
http://www.sufjan.com/
src>
cd>
cd>
artist>Stoatartist>
title>Future come and get metitle>
src>
http://www.stoatmusic.com/
src>
cd>
cd>
artist>The White Stripesartist>
title>Get behind me satantitle>
src>
http://www.whitestripes.com/
src>
cd>
catalog>
You could use the following XPath to return all the cd elements '//cd'
not declared in a namespace.
Now let's take the same XML however now defining all elements in the
'http://www.edankert.com/examples/' namespace.
And instead of prefixing all the different elements (although this would cause
the same problem), we're declaring a so-called default namespace at the root element.
So the XML now looks like:
catalog xmlns="http://www.edankert.com/examples/">
cd>
artist>Sufjan Stevensartist>
title>Illinoistitle>
src>
http://www.sufjan.com/
src>
cd>
cd>
artist>Stoatartist>
title>Future come and get metitle>
src>
http://www.stoatmusic.com/
src>
cd>
cd>
artist>The White Stripesartist>
title>Get behind me satantitle>
src>
http://www.whitestripes.com/
src>
cd>
catalog>
When we now use the same XPath as above '//cd', we notice that nothing
is returned. This is because the specified XPath returns all cd elements
that have not been declared in a namespace and in the example
above all the 'cd' elements are declared in the
'http://www.edankert.com/examples/' namespace.
Namespace-Prefix mappings
We need some kind of way to specify in our XPath expression that we are looking
for all 'cd' elements in the 'http://www.edankert.com/examples/' namespace.
To handle this, the XPath specification allows us to use a QName to specify
an element or an attribute. A QName can be either a name on its own 'element' or
a name with a prefix 'pre:element'. This prefix however needs to be mapped to a
Namespace URI. So mapping the 'pre' prefix to the 'http://www.edankert.com/test'
Namespace URI should allow us to find all 'element' elements defined in the
'http://www.edankert.com/test' namespace.
In this case for instance we could use the 'edx' prefix and map this prefix to the
'http://www.edankert.com/examples/' namespace URI. This would result in the following
XPath expression that should return all 'cd' elements that are declared
in the 'http://www.edankert.com/examples/' namespace: '//edx:cd'.
All XPath processors allow you to specify prefix-namespace mappings, however
how depends on the specific implementation. See below for examples of
how to map namespaces and prefixes using Jaxen (JDOM/dom4j/XOM), JAXP and XSLT.
Jaxen and Dom4J
The following code reads a XML Document from the file system in a
org.dom4j.Document
and searches this document for 'cd' elements defined in the 'http://www.edankert.com/examples/' namespace.
try {
SAXReader
reader = new
SAXReader
();
Document
document = reader.read( "file:catalog.xml");
HashMap
map = new
HashMap
();
map.put( "edx", "http://www.edankert.com/examples/");
XPath
xpath = new
Dom4jXPath
( "//edx:cd");
xpath.setNamespaceContext( new
SimpleNamespaceContext
( map));
List
nodes = xpath.selectNodes( document);
...
} catch (
JaxenException
e) {
// An error occurred parsing or executing the XPath
...
} catch (
DocumentException
e) {
// the document is not well-formed.
...
}
The first step is to create a
SAXReader
, which
is used to read the 'catalog.xml' document from the file system and create a dom4j specific
Document
from it.
The next step is the same for all Jaxen implementations, this is to create a
HashMap
of prefix and namespace-uris.
To be able to use the Jaxen XPath functionality with dom4j we need to create a dom4j specific XPath object (
Dom4jXPath
)
passing our XPath expression into the constructor.
Now we have created the
XPath
object, we can
provide the map with prefix and namespace-uris to the XPath engine, wrapping this map in the
SimpleNamespaceContext
object,
the default implementation of the Jaxen
NamespaceContext
interface.
The last step is to perform the search, calling the 'selectNodes()' method on the XPath, passing the
complete dom4j
Document
as the context node for this method. any node in the document can be used as the context node
Jaxen and XOM
XOM is the newest kid on the block of the simplified Java DOM APIs, it's design promises an easy to use and to learn interface.
try {
Builder
builder = new
Builder
();
Document
document = builder.build( "file:catalog.xml");
HashMap
map = new
HashMap
();
map.put( "edx", "http://www.edankert.com/examples/");
XPath
xpath = new
XOMXPath
( "//edx:cd");
xpath.setNamespaceContext( new
SimpleNamespaceContext
( map));
List
nodes = xpath.selectNodes( document);
...
} catch (
JaxenException
e) {
// An error occurred parsing or executing the XPath
...
} catch (
IOException
e) {
// An error occurred opening the document
...
} catch (
ParsingException
e) {
// An error occurred parsing the document
...
}
We need to create a
Builder
object,
to read the 'catalog.xml' document from the file system and to create a XOM specific
Document
.
Next we create the
HashMap
of prefix and namespace-uris.
We need to create a XOM specific XPath object (
XOMXPath
)
passing our XPath expression into the constructor to be able to use the Jaxen XPath functionality with XOM.
After we have created the
XPath
object, we again
provide the map with prefix and namespace-uris to the XPath engine, wrapping this map in the
SimpleNamespaceContext
object.
Finally we perform the search by calling the 'selectNodes()' method on the XPath object, passing the
XOM
Document
as the context
node for this method.
Jaxen and JDOM
JDOM, the first of the simplified XML APIs.
try {
SAXBuilder
builder = new
SAXBuilder
();
Document
document = builder.build( "file:catalog.xml");
HashMap
map = new
HashMap
();
map.put( "edx", "http://www.edankert.com/examples/");
XPath
xpath = new
JDOMXPath
( "//edx:cd");
xpath.setNamespaceContext( new
SimpleNamespaceContext
( map));
List
nodes = xpath.selectNodes( document);
...
} catch (
JaxenException
e) {
// An error occurred parsing or executing the XPath
...
} catch (
IOException
e) {
// An error occurred opening the document
...
} catch (
JDOMException
e) {
// An error occurred parsing the document
...
}
First we create a JDOM specific
Document
using the
SAXBuilder
object.
Next we create a JDOM specific XPath object (
JDOMXPath
.
After this, we can provide the map with prefix and namespace-uris to the XPath engine, wrapping this map in
the
SimpleNamespaceContext
object.
Finally we perform the search by calling the 'selectNodes()' method on the XPath object, passing the
JDOM
Document
as the
context node for this method.
JAXP XPathFactory
Since version 1.3, JAXP also provides a generic mechanism to perform XPath searches on XML Object Models.
try {
DocumentBuilderFactory
domFactory =
DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware( true);
DocumentBuilder
builder = domFactory.newDocumentBuilder();
Document
document = builder.parse( new
InputSource
( "file:catalog.xml"));
XPathFactory
factory =
XPathFactory
.newInstance();
XPath
xpath = factory.newXPath();
xpath.setNamespaceContext( new
NamespaceContext
() {
public
String
getNamespaceURI(
String
prefix) {
if ( prefix.equals( "edx")) {
return "http://www.edankert.com/examples/";
} else if ...
...
}
return
XPathConstants
.NULL_NS_URI;
}
public
String
getPrefix(
String
namespaceURI) {
if ( namespaceURI.equals( "http://www.edankert.com/examples/")) {
return "edx";
} else if ...
...
}
return null;
}
public
Iterator
getPrefixes(
String
namespaceURI) {
ArrayList
list = new
ArrayList
();
if ( namespaceURI.equals( "http://www.edankert.com/examples/")) {
list.add( "edx");
} else if ...
...
}
return list.iterator();
}
});
Object
nodes = xpath.evaluate( "//edx:cd", document.getDocumentElement(),
XPathConstants
.NODESET);
...
} catch (
ParserConfigurationException
e) {
...
} catch (
XPathExpressionException
e) {
...
} catch (
SAXException
e) {
...
} catch (
IOException
e) {
...
}
First we build a
org.w3c.dom.Document
using the JAXP
DocumentBuilderFactory
functionality, making sure namespace processing is enabled.
We can now search this document by creating a
XPath
object using the
XPathFactory
.
To provide a map with prefix and namespace-uris to the
XPath
engine we need to implement the
NamespaceContext
interface, there is currently no default implementation available. This means implementing the getNamespaceURI, getPrefix and
getPrefixes methods, making sure the methods return the correct values, also for the 'xmlns' and 'xml' namespace prefixes.
After we have provided the
NamespaceContext
to the
XPath
engine, we can
evaluate our XPath expression using the evaluate method, providing our XPath expression, using the root element
as the starting context and specifying a
NodeList
as the desired
return type.
XSLT
XPath was originally designed to be used with XSLT, this (and maybe because XSLT is an XML
vocabulary) might explain why declaring prefix namespace-uri mappings in XSLT seems very
natural.
xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
xsl:template match="//edx:cd" xmlns:edx="http://www.edankert.com/examples/">
xsl:apply-templates/>
xsl:template>
xsl:stylesheet>
To specify the prefix namespace-uri we can simply specify a namespace-uri for the 'edx' prefix, using
the normal XML mechanism.
To get the same output as for the previous examples, we can use a xsl:template
that matches our //edx:cd XPath expression.
Conclusion
So, to be able to use XPath expressions on XML content defined in a (default) namespace, we need to
specify a namespace prefix mapping. As we have seen, it does not matter what prefix the namespace is
mapped to.
This same mechanism can also be used to search for elements that have been defined using a different prefix.
This means that the above examples will also work on the following XML where instead of using a default namespace,
the namespace has been mapped to the 'examples' prefix:
examples:catalog xmlns:examples="http://www.edankert.com/examples/">
examples:cd>
examples:artist>Sufjan Stevensexamples:artist>
examples:title>Illinoisexamples:title>
examples:src>
http://www.sufjan.com/
examples:src>
examples:cd>
examples:cd>
examples:artist>Stoatexamples:artist>
examples:title>Future come and get meexamples:title>
examples:src>
http://www.stoatmusic.com/
examples:src>
examples:cd>
examples:cd>
examples:artist>The White Stripesexamples:artist>
examples:title>Get behind me satanexamples:title>
examples:src>
http://www.whitestripes.com/
examples:src>
examples:cd>
examples:catalog>
Using the XPath expression '//edx:cd' and namespace prefix mapping
from the examples above will again return all 'cd' elements that are declared
in the 'http://www.edankert.com/examples/' namespace.
Sample Code
Download any of the archives to try out the examples above.
The archives consist of the ./catalog.xml document
and 4 Java code examples (in the ./src directory) to search the document using DOM, JDOM, dom4j and XOM.
To run these examples, please use the following command-line options:
ModelCommand Line
DOMjava -cp xpath-examples.jar com.edankert.examples.dom.XPathExample
JDOMjava -cp xpath-examples.jar;lib/jdom.jar;lib/jaxen-1.1.1.jar com.edankert.examples.jdom.XPathExample
dom4jjava -cp xpath-examples.jar;lib/dom4j-1.6.1.jar;lib/jaxen-1.1.1.jar com.edankert.examples.dom4j.XPathExample
XOMjava -cp xpath-examples.jar;lib/xom-1.0.jar;lib/jaxen-1.1.1.jar com.edankert.examples.xom.XPathExample
The archive also contains the example XML Stylesheet (./catalog.xsl). To process the XML with
the stylesheet please invoke your favorite XML Processor from the command-line or use the
transform.xhp project included in the ./xmlhammer-projects directory.
To be able to process the transform.xhp and the also included xpath.xhp project, you will need
to have the
XML Hammer
application installed. This can be downloaded from:
http://www.xmlhammer.org/downloads.html
.
Resources
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u2/67865/showart_1420989.html |
|