[oXygen-user] unicode support?
Oxygen XML Editor Support
support at oxygenxml.com
Tue Jan 15 10:02:24 CST 2013
Hello,
This is XSLT processor related. My guess is Saxon 9 doesn't process the
lower-case() function as you expect. This could also be further
delegated as Java related, since Saxon 9 runs on top of Java and I'm
guessing it uses its uppercase/lowercase mapping mechanism. Further
investigation is necessary.
I've also looked at the default-collation attribute from XSLT, but it
doesn't seem to affect this.
Regards,
Adrian
Adrian Buza
oXygen XML Editor and Author Support
Tel: +1-650-352-1250 ext.202
Fax: +40-251-461482
support at oxygenxml.com
http://www.oxygenxml.com
David Birnbaum wrote:
> Dear <oXygen/> support,
>
> I'm trying to case-fold some early Cyrillic text, which includes
> characters from the Unicode Cyrillic B range
> (http://www.unicode.org/charts/PDF/UA640.pdf), and the lower-case()
> function does not seem to be returning what I expect. I am testing in
> the XPath browser box in <oXygen/> 14.1 (set to XPath 2.0), but I get
> the same results when performing an XSLT transformation using Saxon-PE
> 9.4.0.4.
>
> Input: string-to-codepoints('Ꙗ')
> Output (as expected): 42582
>
> Input: string-to-codepoints(lower-case('Ꙗ'))
> Output (incorrect): 42582
>
> That is, I get the same result when I process this upper-case letter
> regardless of whether I try to convert it to lower case.
>
> The lower-case counterpart of U+A656 is U+A657. The case mapping seems
> to be correct in the Unicode property table
> at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, where the
> relevant lines are:
>
> A656;CYRILLIC CAPITAL LETTER IOTIFIED A;Lu;0;L;;;;;N;;;;A657;
> A657;CYRILLIC SMALL LETTER IOTIFIED A;Ll;0;L;;;;;N;;;A656;;A656
>
> For comparison (ASCII-range characters):
>
> Input: string-to-codepoints('A')
> Output (as expected): 65
>
> Input: string-to-codepoints(lower-case('A'))
> Output (as expected): 97
>
> It looks, then, as if the lower-case() function works properly on some
> Unicode characters, such as those in the ASCII range, but not on
> others, such as those in the Cyrillic B range. The Cyrillic B
> characters have been in Unicode since version 5.1.0 (April 4, 2008);
> Unicode is now at 6.2.0. Is this a bug (and if so, whose bug is it?),
> or are my expectations based on a misunderstanding?
>
> Thanks,
>
> David (djbpitt at gmail.com <mailto:djbpitt at gmail.com>)
> ------------------------------------------------------------------------
>
> _______________________________________________
> oXygen-user mailing list
> oXygen-user at oxygenxml.com
> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
>
More information about the oXygen-user
mailing list