Page 1 of 1
transformation change 6.1 to 6.2?
Posted: Mon Sep 19, 2005 8:57 pm
by sgoetz
I installed version 6.2 this morning (thanks for the new release!). An XML-to-HTML transformation scenario that worked fine in 6.1 has problems in 6.2 with Unicode points. For ç, e.g., we used to get the correct "É" in HTML but now have "ç". (The problem is visible in the HTML file, independent of browser; I use Firefox 1.4, a.k.a. 1.5 beta1.) Is there a new or changed Oxygen option that I'm failing to see?
Our XML files specify UTF-8; the transformation uses an XSLT 2.0 style sheet with Saxon 8B. In 6.1 the scenario's Output tab settings have "Open in browser" and "Prompt for file" selected, and "Show as XML" checked. The 6.2 Output tab has nearly the same options, I notice, though their layout has changed.
I've tried using Oxygen 6.2's output display (uncheck "Open in browser", show as XHTML). That takes away the Unicode rendering problem, but it doesn't respect the style sheet sufficiently; we specify indentations in ems, for example, which get ignored inside of Oxygen 6.2.
Thanks for your help.
erratum
Posted: Mon Sep 19, 2005 11:25 pm
by sgoetz
(For "ç" above, read "É"--not that it matters for the general problem; all code points fail to display correctly.)
Posted: Tue Sep 20, 2005 11:51 am
by george
Hi,
It seems that your browser peaks the wrong encoding. The ç (�E7;) when serialized using UTF-8 as encoding and opened using the ISO-8859-1 encoding will look like ç. So your result is serialized using UTF-8 and the browser uses ISO-8859-1.
I tried a simple example and everything works fine for me both with Firefox and with IE as the result of the transformation specifies the charset as UTF-8. Please try the samples below.
test.xml
Code: Select all
<test>
ÉÉÉÉ
</test>
test.xsl
Code: Select all
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>test</title>
</head>
<body>
<xsl:value-of select="test"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
If you want your output in the ISO-8859-1 encoding then add
Code: Select all
<xsl:output encoding="ISO-8859-1"/>
to your stylesheet.
Best regards,
George
Posted: Thu Sep 22, 2005 2:55 am
by sgoetz
george wrote:So your result is serialized using UTF-8 and the browser uses ISO-8859-1.
Thanks very much for the suggestion re: <xsl:output/>! It fixes the output problem. (I don't know XSL well yet and didn't create the style sheet I'm using.)
It does seem odd, however, that using the same browser settings, style sheet, and transformation scenario should yield different results in Oxygen 6.1 and 6.2. I don't see something directly related to this change amongst the new features for 6.2--presumably a small part of improved XSLT handling?--and now I wonder what else I'm missing....
Sharon
one more question
Posted: Thu Sep 22, 2005 3:27 am
by sgoetz
Hello again-- though your suggestion does work, I think it has a downside, so please bear with me--
Here's the difference between the two versions' output: without your suggested addition (<xsl:output/>) and with all other variables the same, 6.1 converts Unicode hex to decimal but leaves character entities in the HTML output file. 6.2 converts the entities to glyphs in my system's default encoding (ISO-8859-1, as you identified). Let's use a real example this time: where my XML file has ’, 6.1 gives &# 8212; (minus the space) and 6.2 gives —.
Can I set 6.2 to do what 6.1 did, namely to retain character entities? I see no reason to enforce conversion of entities to glyphs, which assumes that a single encoding always holds. (My project's HTML output will not always be viewed by browsers that use 8859-1 as their default.) Also, glyph-conversion fails in some cases; that's the point of using Unicode points at all, no?
Thanks again for your time and help,
Sharon
Posted: Thu Sep 22, 2005 9:47 am
by george
Hi Sharon,
The difference between 6.1 and 6.2 wrt XSLT 2.0 transformations is that we updated from Saxon 8.4 to Saxon 8.5.1. But I do not see any problem with what oXygen 6.2/Saxon 8.5.1 generates as output. If the output character has a representation in the output encoding then the processor writes it without the need for an entity. If you want to generate UTF-8 then you set the output encoding to UTF-8 and the result file is in UTF-8 and also the meta element contains charset attribute that specifies UTF-8 so any application that loads your output file should be able to detect that it is UTF-8. The same for ISO-8851-9. Note also that the XSLT data model does not have a notion of character entoties so they cannot be preserved, the XSLT processor sees the actual characters not the entities.
Now if you want to have entities in the output you can use ASCII as output encoding. That should output entities for all non ASCII characters.
Best Regards,
George
Posted: Fri Sep 23, 2005 6:11 pm
by sgoetz
Hi George,
Thanks once again for your helpful reply. After doing a bit of reading, I've realized that my ignorance confused me re: expectations of "correct" behavior. Using ASCII as the encoding will work till we decide exactly how we want to deal with those entities....
Cheers,
Sharon