[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Character Encoding Problem


Subject: Re: [xsl] Character Encoding Problem
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 23 Sep 2014 20:31:41 -0000

If you know what the offending characters are, you can use a character map to
convert them.

If not, you could do a replace() to eliminate all out-of-range characters.

Michael Kay
Saxonica
mike@xxxxxxxxxxxx
+44 (0) 118 946 5893




On 23 Sep 2014, at 21:23, Craig Sampson craig.sampson@xxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> Hi,
>   Were trying to create a java properties file using XSLT 2.0 in Saxon. The
input is XML encoded as UTF-8. The properties file needs to be encoded as
ISO-8859-1. The character giving the problem, in the input file, is &#x201c;
which is a left hand double quote. Looking at the ISO-8859-1 character set the
closest character appears to be a double quote  with no hand (left/right).
>   Should I expect other errors like this when going from a large character
set to a smaller more restricted set? It seems like this should be handled
more gracefully with a missing character box and maybe a warning message.
>   Has anyone run into this situation before and how did you handle it?
> Thanks,
>   Craig Sampson
>
> Input file:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE eua PUBLIC "-//SAS//DTD eua 1.0//EN"
> "eua.dtd">
> <eua xml:lang="en"><euaCollection name="sas.hc.m"><euaTopic
> eid="n0w5y4a9ucji7fn16m42xi7i3j2t" name="123">
> <title>My 123 Optional Title</title>
> <toolTip>Tool tip for 123 &amp; &#x201c; &lt; &lt; &gt;</toolTip>
> <topicContent eid="n0yjc0to9qjhs1n1xwir4lfg1rrv">
> <paragraph eid="p00bfr28a0mbcln1ugargzppuhij">Hello world. &amp; &#x201c;
> &lt; &lt; &gt;</paragraph><?Pub Caret -1?>
> </topicContent>
> </euaTopic><euaTopic eid="n1ogkb1kvm4wx1n1ju8dias2k2on" name="xyz">
> <topicContent eid="p0hc4mjl3x7r4mn117mi422yk3yp">
> <paragraph eid="n1v2zx7vi5mxhon1burd6kgqjlj0">Hello Mars.</paragraph>
> </topicContent>
> </euaTopic></euaCollection><euaCollection name="sas.hc.uicommons">
> <euaTopic eid="n0niwwwbn1ndepn11fakurtbq2ma" name="6677">
> <title>My 6677 Optional Title.</title>
> <topicContent eid="p0h761l1tlxxyan1a25bwzvgintt">
> <paragraph eid="p1xxbgp1nh96e4n1ubhq126t48bo">blah.</paragraph>
> </topicContent>
> </euaTopic></euaCollection></eua>
>
>
> Heres the XSL code:
>
>                      <xsl:result-document href="{$fnameHref}" indent="no"
>                            method="text" encoding="ISO-8859-1"
include-content-type="no">
>                            <xsl:apply-templates />
>                      </xsl:result-document>
>
> But were getting encoding errors about a character not being available.
>
> Output character not available in this encoding (decimal 8220)
>
file:///C:/java/ide/eclipse/4.3/vert-i4xis14/eclipsedata/i4xis14/sas.pubs.xis
.preview.core/XisBuild/XisBuildTools/XisStylesheets/eua.xsl, line 210 column
-1
>
null::file:///C:/java/ide/eclipse/4.3/vert-i4xis14/eclipsedata/i4xis14/sas.pu
bs.xis.preview.core/XisBuild/XisBuildTools/XisStylesheets/eua.xsl, line 210
column -1
> javax.xml.transform.TransformerException: net.sf.saxon.trans.XPathException:
Output character not available in this encoding (decimal 8220) :: Details
Output character not available in this encoding (decimal 8220)
>
file:///C:/java/ide/eclipse/4.3/vert-i4xis14/eclipsedata/i4xis14/sas.pubs.xis
.preview.core/XisBuild/XisBuildTools/XisStylesheets/eua.xsl, line 210 column
-1
>
> XSL-List info and archive
> EasyUnsubscribe (by email)


Current Thread
Keywords