[oXygen-user] Working with Catalogs

Eliot Kimber
Wed Oct 5 12:46:05 CDT 2005


George Cristian Bina wrote:
> Hi Eliot,
> 
> I see that you filed a bug against Xerces:
> http://issues.apache.org/jira/browse/XERCESJ-1104
> 
> Note that it uses an XMLEntityResolver interface (not the SAX 
> EntityResolver) that is at XNI level and that should allow some control 
> over system versus uri mappings if the XMLEntityResolver set uses an XML 
> Catalog.
> This is the interface I thought we should implement to allow the uri 
> mapping. The XMLEntityResolver interface defines one method:

Yes, I've been digging into the code and I have a first stab at a fix.

What I've done is extended the XMLEntityResolver interface to add a 
resolveResourceByUri() method which does nothing but try to resolve the 
system ID value using the resolver's resolveUri() method.

The problem is that resolveEntity() only works for true entities (that 
is, resources that would be mapped via SYSTEM and PUBLIC catalog 
entries). It would be inappropriate to use URI entries to try to resolve 
an external parsed or unparsed entity, in the same way it's 
inappropriate to use SYSTEM or PUBLIC to resolve a schema location URI.

In the case of no-namespace schemas you have no choice but to either use 
some out-of-band binding or use schema location hints.

This is one reason I recommend against using no-namespace schemas. 
They're no better than external DTD subsets because you have no clear 
and reliable way to do a non-syntactic binding of document to schema.

That is, mapping namespace URIs to schemas is non-syntactic, in that the 
syntax of the document is not directly locating the schema. Rather the 
binding is indirect through the namespace, which is an invariant 
property of the document that directly affects the documents inherent 
semantics, as opposed to either a DOCTYPE declaration or schema location 
hint, which is a purely syntactic reference that is not an inherent 
property of the document the presence or absence of doesn't affect the 
inherent semantics of the document.

Anyway, hopefully I'll be able to report more a bit later.

My analysis at this point is that there's a fundamental architectural 
flaw in the current Xerces implementation in that it doesn't distinguish 
XML entities from other resources that might be involved in processing 
and validating a document (i.e., schemas). The approach shown above is 
really a hack to get around this flaw with the least disruptive change. 
I suspect that the same problem exists in the Xerces XInclude 
processing--I would not be surpised if href= values on xi:include 
elements are resolved via SYSTEM and PUBLIC entries. But I don't have 
time or energy to dive into that code just now.

It looks like I may have to experiment with writing my own 
XMLEntityResolver implementation in order to implement my desired 
recursive and fallback catalog resolution behaviors.

Cheers,

Eliot
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8841


www.innodata-isogen.com




More information about the oXygen-user mailing list