[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Correcting unbound namespace prefixes


Subject: Re: [xsl] Correcting unbound namespace prefixes
From: Martin Honnen <Martin.Honnen@xxxxxx>
Date: Mon, 02 Aug 2010 18:21:22 +0200

Tony Nassar wrote:
I'm not sure this is the correct place to post. This may be a question about JAXP, or simply about good standard operating procedure for bad input data.

I've got some XML that I know is invalid, but I'm not in a position to get the customer to fix it. Here's what it looks like:

The term "valid" is used to express validity against a DTD or against schemas. That markup is not namespace well-formed.


<document>
   <text>Four score and twenty years ago..,</text>
   <pp:metadata publication-date="2010-07-31T12:30:00Z" />
  ...

You get the idea (I hope): clearly someone began with XML in the "" namespace, extracted metadata in a post-processing step, and inserted the corresponding markup without adding the necessary namespace declarations or mapping "pp" to one. I don't know of a way to fix this through the JAXP API (i.e. interpolating the prefix mapping). Or am I better off just preprocessing this XML via Perl or Python before it's ever parsed?

You can't parse that successfully with any namespace aware parser as that is required to throw an error on the 'pp:metadata' element name.
And XSLT/XPath operate on a data model that is usually created by parsing with a namespace aware parser so I don't think XSLT and this can help.


I think JAXP however allows you to create non namespace aware SAX or DOM parsers (e.g. http://download-llnw.oracle.com/javase/6/docs/api/javax/xml/parsers/SAXParserFactory.html#isNamespaceAware()) and that way you should at least be able to parse that markup without an error, you will get element names containing colons that way and need to find a way to create namespace well-formed markup instead. Not something I am familiar with and not really on topic here.



--

	Martin Honnen
	http://msmvps.com/blogs/martin_honnen/


Current Thread
Keywords