[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] character entities


Subject: Re: [xsl] character entities
From: "Andrew Welch" <andrew.j.welch@xxxxxxxxx>
Date: Mon, 3 Nov 2008 08:54:25 +0000

Hi,

> I'm having a wee spot of bother with character entities.

It's character encoding rather than character entities

> This data is then put into fields within a Zend Search Lucene index, via
> php (that's why I first "flattened" it).
>
> This index data is then queried (again via php) and the results sent
> to/rendered by a browser.
>
> If I put &#241_; (minus the underline character, which I've added so
> this email is not mis-parsed) in my original xml, and using
> encoding="iso-8859-1" for it and my xsl stylesheet, then my xsl
> transforms that into a (Spanish) n character with a tilde on top: q.
>
> If I tell ZSL to index fields using 'iso-8859-1' encoding, my Spanish n
> becomes: CB1. If I tell ZSL to index fields using 'utf-8' encoding, my
> Spanish n becomes: C1.

These sorts of issues are nearly always a case of writing in one
encoding and reading in another, and you just need to track down where
the reading and writing is happening - it could be a string to byte
conversion in your code, or parsing of the markup in the browser, or
even the text viewer you are using to check the output (such as the
eclipse output window)

> I believe I need to prevent all parsers bar the browser at the end from
> parsing my "special characters", right? But how?

Not really, that's just a way of bypassing encoding problems and
doesn't address the underlying issue.

> Latest effort: I tried using encoding="utf-8" for all levels: my original
> xml, my xsl output, and the input to ZSL's index, & I also saved my xml
file
> as utf-8 format, and used the Spanish n inside my xml, i.e. q rather than
> &#241;. Doing that, the Spanish n was preserved through the xsl output, but
> ZSL stores it as: C1, & that's also how my browser displays it.

Ahh ok, well that's the right approach, you just need to examine the
code at every step and isolate that point where it's going wrong -
you've got to the output of transform ok, next is to carefully step
through what happens between that and "ZSL".

Using the actual n-tilde charactor or the character reference 241
shouldn't make any different, by the way...


cheers
--
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/


Current Thread
Keywords