[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Re: [xsl] Correcting unbound namespace prefixes
Subject: Re: [xsl] Correcting unbound namespace prefixes
From: Martin Honnen <Martin.Honnen@xxxxxx>
Date: Mon, 02 Aug 2010 18:21:22 +0200
|
Tony Nassar wrote:
I'm not sure this is the correct place to post. This may be a question about JAXP, or simply about good standard operating procedure for bad input data.
I've got some XML that I know is invalid, but I'm not in a position to get the customer to fix it. Here's what it looks like:
The term "valid" is used to express validity against a DTD or against
schemas. That markup is not namespace well-formed.
<document>
<text>Four score and twenty years ago..,</text>
<pp:metadata publication-date="2010-07-31T12:30:00Z" />
...
You get the idea (I hope): clearly someone began with XML in the "" namespace, extracted metadata in a post-processing step, and inserted the corresponding markup without adding the necessary namespace declarations or mapping "pp" to one. I don't know of a way to fix this through the JAXP API (i.e. interpolating the prefix mapping). Or am I better off just preprocessing this XML via Perl or Python before it's ever parsed?
You can't parse that successfully with any namespace aware parser as
that is required to throw an error on the 'pp:metadata' element name.
And XSLT/XPath operate on a data model that is usually created by
parsing with a namespace aware parser so I don't think XSLT and this can
help.
I think JAXP however allows you to create non namespace aware SAX or DOM
parsers (e.g.
http://download-llnw.oracle.com/javase/6/docs/api/javax/xml/parsers/SAXParserFactory.html#isNamespaceAware())
and that way you should at least be able to parse that markup without an
error, you will get element names containing colons that way and need to
find a way to create namespace well-formed markup instead. Not something
I am familiar with and not really on topic here.
--
Martin Honnen
http://msmvps.com/blogs/martin_honnen/
|