[oXygen-user] unicode support?

David Birnbaum djbpitt at gmail.com
Sat Jan 12 21:38:25 CST 2013


Dear <oXygen/> support,

I'm trying to case-fold some early Cyrillic text, which includes characters
from the Unicode Cyrillic B range (
http://www.unicode.org/charts/PDF/UA640.pdf), and the lower-case() function
does not seem to be returning what I expect. I am testing in the XPath
browser box in <oXygen/> 14.1 (set to XPath 2.0), but I get the same
results when performing an XSLT transformation using Saxon-PE 9.4.0.4.

Input: string-to-codepoints('&#xa656;')
Output (as expected): 42582

Input: string-to-codepoints(lower-case('&#xa656;'))
Output (incorrect): 42582

That is, I get the same result when I process this upper-case letter
regardless of whether I try to convert it to lower case.

The lower-case counterpart of U+A656 is U+A657. The case mapping seems to
be correct in the Unicode property table at
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, where the relevant
lines are:

A656;CYRILLIC CAPITAL LETTER IOTIFIED A;Lu;0;L;;;;;N;;;;A657;
A657;CYRILLIC SMALL LETTER IOTIFIED A;Ll;0;L;;;;;N;;;A656;;A656

For comparison (ASCII-range characters):

Input: string-to-codepoints('&#x0041;')
Output (as expected): 65

Input: string-to-codepoints(lower-case('&#x0041;'))
Output (as expected): 97

It looks, then, as if the lower-case() function works properly on some
Unicode characters, such as those in the ASCII range, but not on others,
such as those in the Cyrillic B range. The Cyrillic B characters have been
in Unicode since version 5.1.0 (April 4, 2008); Unicode is now at 6.2.0.
 Is this a bug (and if so, whose bug is it?), or are my expectations based
on a misunderstanding?

Thanks,

David (djbpitt at gmail.com)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.oxygenxml.com/pipermail/oxygen-user/attachments/20130112/307a669c/attachment.html 


More information about the oXygen-user mailing list