[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Switching off character entity resolution in XSL


Subject: Re: [xsl] Switching off character entity resolution in XSL
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 04 Feb 2004 14:09:49 -0500

At 05:09 AM 2/3/2004, Richard wrote:
... My strategy is
to live with the fact that the parser has carried out all the entity
mappings, and to use a "mappings" document containing entries like this:

<char>
<name>Delta</name>
<value>&#x0394;</value>
<unicode>0394</unicode>
<description>Delta       Dec:916 </description>
<mapping>[capital Delta]</mapping>
<!--U0394 /Delta capital Delta, Greek -->
</char>
...

Yes, it is slow and clumsy, and yes, it does use the deprecated
disable-output-escaping, but it does work ...

In my view this is a perfectly reasonable approach, as long as one is clear on the dependencies it introduces -- by using XSLT to drive the serializer, one in effect requires that the result be written out to a file (using a processor that implements d-o-e, of course), but since that's built into the requirement to begin with, it's not a big deal. Accordingly, I don't consider it an abuse of d-o-e -- just an application of XSLT+serializer as string writer bound to XSLT's role as a transformer. (In fact when I've implemented this solution to the entity-writing problem, I've deliberate kept the d-o-e operations separate from transformation logic, pipelining two different stylesheets. This way the entity-writing routine is portable.)


Also see Zarella Rendon and Tony Coates on this issue: http://www.xml.com/pub/a/2003/01/02/xmlchar.html

Also, Mike wrote:
I'm afraid the simple answer is the ugly one: just preprocess the entity
references with a text editor to read "$#$bull;" instead of "&bull;". No
point banging your head against the wall to find something more elegant,
it will only give you a headache.

This approach, wrapping your transformation in non-XSLT "entity escaping/un-escaping" routines, may perform better (faster tools), and has the virtue of architectural clarity. It does introduce other local dependencies, of course, but for this kind of a problem that's not really an issue, is it?


Cheers,
Wendell


Richard Light

>Example:
>source document contains:     &bull;
>After transformation:         [bull  ]    (of course, the entity declared
>in the DTD is this, i.e. <!ENTITY bull "[bull  ]">)
>What I would like:            &bull;
>
>I really don't want to go messing with the DTD either, and I really don't
>think a parser would like there being unparsed entities within an entity
>declaration in a  DTD i.e. <!ENTITY bull &bull;> is illegal.
>
>I realise there is some way of dealing with this with character
>substitutions before or after using something like sed, but this isn't
>really a great solution, particularly across platforms. Is there any way of
>manipulating the output using XSL, or alternatively switching off entity
>resolution in the parser? I've played with custom entity resolvers with
>Java XML parsers (i.e. resolving URLs for example) but cannot see how this
>could be used for external character entities, and also realise there is
>some scope for writing a solution in something like JDOM - but what a pain!
>That defeats the whole purpose of XSL. I have gotten used to a pretty good
>compromise of using Saxon with the Xerces parser and the Norm Walsh entity
>resolver classes if that's of any help.
>
>Either there's a simple solution to this, it's something XML 2.0 (or
>whatever is on the horizon) might address (which is no help for me really),
>I'm on the wrong mailing list or I should just resort back to ("the good
>ol' days of" - yes, sarcasm) Omnimark which was really good at "unparsing"
>entities. I'm sure others experience similar problems so hopefully the
>first option is the right one (i.e. easy ?).
>
>Thanks very much,
>Alan Hynes.
>
>
>
>
>
>
> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>

--
Richard Light
SGML/XML and Museum Information Consultancy
richard@xxxxxxxxxxxxxxxxx


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list




Current Thread
Keywords