[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: ISO-8859-1 encoding and XmlDecl omision (was Re: [xsl] Looking up keys in a separate xml file)


Subject: Re: ISO-8859-1 encoding and XmlDecl omision (was Re: [xsl] Looking up keys in a separate xml file)
From: David Carlisle <davidc@xxxxxxxxx>
Date: Wed, 7 Jan 2004 15:19:34 GMT

Andrew

  I would have thought that as ascii is a subset of utf-8, the processor
  could happily leave the declartion out knowing that any future parsing
  of the document would use utf-8 (by default) and could correctly read
  the file.  

The reason why I had the reference to this part of the spec to hand was
that I had quoted it in an comment on the offical comment list:




http://lists.w3.org/Archives/Public/public-qt-comments/2003Nov/0050.html



    According to 
    http://www.w3.org/TR/2003/WD-xslt-xquery-serialization-20030502/#N400318
    
    
       The omit-xml-declaration parameter should be ignored if the standalone
       parameter is present, or if the encoding parameter specifies a value
       other than UTF-8 or UTF-16.
    
    There is one other case where it would be very useful to omit the
    declaration (or at least to use a value of utf-8) namely
    iso-646 (aka ASCII aka US-ASCII).
    
    It may be politically incorrect to say that ascii characters are still
    more interoperable than non-ascii characters, but in practice this is
    still the case. Especially in XML which specifies that a charset
    specified in the mime headers takes precedence it is hard to give (say) a
    utf8 file to someone to serve from their website without first finding
    out what http server they use, and how to make sure it won't serve the
    thing as latin 1 resulting in a non-well formed document.
    (See current discussion on W3C'S TAG list about this).
    
    One style of producing XML files that avoids these problems is to
    produce files that don't have an xml declaration (or have one that
    specifies utf-8) but to encode all non-ascii characters as numeric
    character references.
    
    Currently in an XSLT 1 usage in production I use
    <xsl:output encoding="US-ASCII"/>
    with saxon and post process with sed to remove the US-ASCII
    encoding declaration (which stops the file being parsed on several XML
    systems I have locally) I think that it would be very desirable if
    
    <xsl:output encoding="iso-646" omit-xml-declaration="yes"/>
    
    was defined to work, and produce files of the form described above.
    
    Failing that it would be good if it would be allowed by the
    specification if the system understood that encoding.
    
    
    
    
-- 
http://www.dcarlisle.demon.co.uk/matthew

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords