[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Access to unparsed entities

Subject: Re: [xsl] Access to unparsed entities
From: Gregory Murphy <Gregory.Murphy@xxxxxxxxxxx>
Date: Sun, 20 Oct 2002 16:32:43 -0700 (PDT)

On Fri, 18 Oct 2002, Jeni Tennison wrote:

> Hi Wendell, Greg,
> 
> >>It would be nice to have such [unparsed] entities stored in a table
> >>when the document is first read in, such that an XSL transformation
> >>can read from and write to the table, and such that the table is
> >>again written out in the document's internal DTD subset after
> >>transformation is complete.
> >
> > Wouldn't it? This sounds like something very nice for XSLT 2.0. Off
> > hand, I don't know what what they're planning if anything. (Can
> > anyone speak to that? Jeni?)
> 
> Hmm... Well, there's a "could" requirement for this in the XSLT 2.0
> requirements [1]:
> 
>   2.16 Could Improve Support for Unparsed Entities
> 
>   In XSLT 1.0 there is an asymmetry in support for unparsed entities.
>   They can be handled on input but not on output. In particular, there
>   is no way to do an identity transformation that preserves them. At a
>   minimum we need the ability to retrieve the Public ID of an unparsed
>   entity.
>  
> The latest XSLT 2.0 WD has got a function to support the ability to
> retrieve the public ID of an unparsed entity, namely
> unparsed-entity-public-id() [2]. So there's enough information
> available in the stylesheet to let you build the table of unparsed
> entities yourself.

This is certainly improvement, as at least no information from the source
document is inaccessible to the transformation.

> If you did build such a table, then you can use the set of elements
> described in Appendix G, "Representation of Lexical XML Constructs"
> [3] in order to create a DOCTYPE declaration in which you declare the
> entities that you want to declare. Something like:
> 
>   <lex:doctype name="foo">
>     <xsl:for-each select="$entity">
>       <lex:unparsed-entity-declaration name="{.}"
>         system-id="{unparsed-entity-uri(.)}"
>         public-id="{unparsed-entity-public-id(.)}" />
>     </xsl:for-each>
>   </lex:doctype>
> 
> (Hmm... I see that there's no way of getting the entity notation at
> the moment; we should probably address that, but that, of course,
> means also adding notation declarations, which aren't supported at
> all currently -- or is the notation something that's derivable from
> the public/system ID?)

Another possibility is to build the table using a SAX filter, and insert
the contents of the table into the document using elements defined in
Appendix G, as you demonstrate above. This has the advantage that it could
be made to work with XSLT 1.0, and wouldn't require any extensions.

I hadn't read Appendix G, but now that I have, I think it is preferable to
trying to reconstruct the document type internal subset in the result
document. It converts all those archaic SGML constructs to plain old XML,
which will make all subsequent processing easier to understand.

> If either or both of you could drop a line to
> public-qt-comments@xxxxxx giving an example of what you want to be
> able to do, that would be helpful, especially if what I've described
> above doesn't meet your requirements.

As long as nothing declared in the document is hidden from the
transformation, I think the standard is adequate. XSLT 2.0 has addressed
the lack of access to an entity's public identifier. It would nice if a
future version would also provide access to the notation. Unparsed external
entities are _very_ SGML, and in a schema-enlightened world, will hopefully
go away, so I don't think a strong case could be made for providing extra
support for their construction in an XML result document. Most of them are
probably coming from SGML documents converted to XML.

// Gregory Murphy <Gregory.Murphy@xxxxxxx>
// Software Engineer
// Customer Network Platform, Sun Microsystems

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread
[xsl] Access to unparsed entities Gregory Murphy - Fri, 18 Oct 2002 09:50:06 -0700 (PDT) Wendell Piez - Fri, 18 Oct 2002 13:52:33 -0400 Jeni Tennison - Fri, 18 Oct 2002 23:59:00 +0100 Gregory Murphy - Sun, 20 Oct 2002 16:32:43 -0700 (PDT) <= <Possible follow-ups> DPawson - Mon, 21 Oct 2002 09:02:59 +0100 Michael Kay - Mon, 21 Oct 2002 10:56:36 +0100 Peter Flynn - Mon, 21 Oct 2002 22:58:08 +0100 DPawson - Mon, 21 Oct 2002 15:15:29 +0100

Current Thread

[xsl] Access to unparsed entities
- Gregory Murphy - Fri, 18 Oct 2002 09:50:06 -0700 (PDT)
  - Wendell Piez - Fri, 18 Oct 2002 13:52:33 -0400
    - Jeni Tennison - Fri, 18 Oct 2002 23:59:00 +0100
      - Gregory Murphy - Sun, 20 Oct 2002 16:32:43 -0700 (PDT) <=
  - <Possible follow-ups>
  - DPawson - Mon, 21 Oct 2002 09:02:59 +0100
    - Michael Kay - Mon, 21 Oct 2002 10:56:36 +0100
      - Peter Flynn - Mon, 21 Oct 2002 22:58:08 +0100
  - DPawson - Mon, 21 Oct 2002 15:15:29 +0100

<- Previous	Index	Next ->
Re: [xsl] Access to unparsed entiti, Jeni Tennison	Thread	RE: [xsl] Access to unparsed entiti, DPawson
RE: [xsl] triggers in XML, Curtis Fisher	Date	Re: Functional Language wasRE: [xsl, J.Pietschmann
	Month

Keywords

xml
xsl
xslt

Re: [xsl] Access to unparsed entities

Products

Features

Shop

Resources

Support

Company