[oXygen-user] Working with Catalogs
Eliot Kimber
Wed Oct 5 12:46:05 CDT 2005
George Cristian Bina wrote:
> Hi Eliot,
>
> I see that you filed a bug against Xerces:
> http://issues.apache.org/jira/browse/XERCESJ-1104
>
> Note that it uses an XMLEntityResolver interface (not the SAX
> EntityResolver) that is at XNI level and that should allow some control
> over system versus uri mappings if the XMLEntityResolver set uses an XML
> Catalog.
> This is the interface I thought we should implement to allow the uri
> mapping. The XMLEntityResolver interface defines one method:
Yes, I've been digging into the code and I have a first stab at a fix.
What I've done is extended the XMLEntityResolver interface to add a
resolveResourceByUri() method which does nothing but try to resolve the
system ID value using the resolver's resolveUri() method.
The problem is that resolveEntity() only works for true entities (that
is, resources that would be mapped via SYSTEM and PUBLIC catalog
entries). It would be inappropriate to use URI entries to try to resolve
an external parsed or unparsed entity, in the same way it's
inappropriate to use SYSTEM or PUBLIC to resolve a schema location URI.
In the case of no-namespace schemas you have no choice but to either use
some out-of-band binding or use schema location hints.
This is one reason I recommend against using no-namespace schemas.
They're no better than external DTD subsets because you have no clear
and reliable way to do a non-syntactic binding of document to schema.
That is, mapping namespace URIs to schemas is non-syntactic, in that the
syntax of the document is not directly locating the schema. Rather the
binding is indirect through the namespace, which is an invariant
property of the document that directly affects the documents inherent
semantics, as opposed to either a DOCTYPE declaration or schema location
hint, which is a purely syntactic reference that is not an inherent
property of the document the presence or absence of doesn't affect the
inherent semantics of the document.
Anyway, hopefully I'll be able to report more a bit later.
My analysis at this point is that there's a fundamental architectural
flaw in the current Xerces implementation in that it doesn't distinguish
XML entities from other resources that might be involved in processing
and validating a document (i.e., schemas). The approach shown above is
really a hack to get around this flaw with the least disruptive change.
I suspect that the same problem exists in the Xerces XInclude
processing--I would not be surpised if href= values on xi:include
elements are resolved via SYSTEM and PUBLIC entries. But I don't have
time or energy to dive into that code just now.
It looks like I may have to experiment with writing my own
XMLEntityResolver implementation in order to implement my desired
recursive and fallback catalog resolution behaviors.
Cheers,
Eliot
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8841
www.innodata-isogen.com
More information about the oXygen-user
mailing list