[oXygen-user] Content of oXygen-user Digest, Vol 27, Issue 8

Holley, Erik erik.holley at pearson.com
Tue Jan 15 13:59:04 CST 2013


Unicode 4.0 does not contain the character you're looking for. It's found in
Unicode 6.0 Cyrillic Extended-B. Java 1.6 is based on Unicode 4.0. Java 1.7
is based on Unicode 6.0. Thus, to get the proper mapping, you'll need to be
using Java 1.7.

----------------------
Sample Program
----------------------
public class Case {

    public static void main(String[] args) {

        String unicodeVersion;
        String specVersion =
System.getProperty("java.specification.version");
        if(specVersion.equals("1.7"))
            unicodeVersion = "6.0";
        else if(specVersion.equals("1.6"))
            unicodeVersion = "4.0";
        else 
            unicodeVersion = "n/a";

        System.out.println(unicodeVersion);
        
        char[] originalChars = { 0x41, 0xa656 };
        String theString = new String(originalChars);
        System.out.println(theString.charAt(0) + "\t" +
theString.charAt(1));
        System.out.println(theString.codePointAt(0) + "\t" +
theString.codePointAt(1));
        System.out.println(Character.isLowerCase(theString.charAt(0)) + "\t"
+ Character.isLowerCase(theString.charAt(1)));
        
        theString = theString.toLowerCase();
        System.out.println(theString.charAt(0) + "\t" +
theString.charAt(1));
        System.out.println(theString.codePointAt(0) + "\t" +
theString.codePointAt(1));
        System.out.println(Character.isLowerCase(theString.charAt(0)) + "\t"
+ Character.isLowerCase(theString.charAt(1)));
    }

}

----------------------
Java 1.6 Output
----------------------
4.0
A       ?
65      42582
false   false
a       ?
97      42582
true    false

----------------------
Java 1.7 Output
----------------------
6.0
A       ?
65      42582
false   false
a       ?
97      42583
true    true

-Erik


On 1/15/13 11:00 AM, "oxygen-user-request at oxygenxml.com"
<oxygen-user-request at oxygenxml.com> wrote:

> Send oXygen-user mailing list submissions to
> oxygen-user at oxygenxml.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
> or, via email, send a message with subject or body 'help' to
> oxygen-user-request at oxygenxml.com
> 
> You can reach the person managing the list at
> oxygen-user-owner at oxygenxml.com
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of oXygen-user digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: unicode support? (Oxygen XML Editor Support)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 15 Jan 2013 18:02:24 +0200
> From: Oxygen XML Editor Support <support at oxygenxml.com>
> Subject: Re: [oXygen-user] unicode support?
> To: David Birnbaum <djbpitt at gmail.com>
> Cc: oxygen-user at oxygenxml.com
> Message-ID: <50F57D90.1060208 at oxygenxml.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> 
> Hello,
> 
> This is XSLT processor related. My guess is Saxon 9 doesn't process the
> lower-case() function as you expect. This could also be further
> delegated as Java related, since Saxon 9 runs on top of Java and I'm
> guessing it uses its uppercase/lowercase mapping mechanism. Further
> investigation is necessary.
> 
> I've also looked at the default-collation attribute from XSLT, but it
> doesn't seem to affect this.
> 
> Regards,
> Adrian
> 
> Adrian Buza
> oXygen XML Editor and Author Support
> 
> Tel: +1-650-352-1250 ext.202
> Fax: +40-251-461482
> support at oxygenxml.com
> http://www.oxygenxml.com
> 
> 
> David Birnbaum wrote:
>> Dear <oXygen/> support,
>> 
>> I'm trying to case-fold some early Cyrillic text, which includes
>> characters from the Unicode Cyrillic B range
>> (http://www.unicode.org/charts/PDF/UA640.pdf), and the lower-case()
>> function does not seem to be returning what I expect. I am testing in
>> the XPath browser box in <oXygen/> 14.1 (set to XPath 2.0), but I get
>> the same results when performing an XSLT transformation using Saxon-PE
>> 9.4.0.4. 
>> 
>> Input: string-to-codepoints('&#xa656;')
>> Output (as expected): 42582
>> 
>> Input: string-to-codepoints(lower-case('&#xa656;'))
>> Output (incorrect): 42582
>> 
>> That is, I get the same result when I process this upper-case letter
>> regardless of whether I try to convert it to lower case.
>> 
>> The lower-case counterpart of U+A656 is U+A657. The case mapping seems
>> to be correct in the Unicode property table
>> at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, where the
>> relevant lines are:
>> 
>> A656;CYRILLIC CAPITAL LETTER IOTIFIED A;Lu;0;L;;;;;N;;;;A657;
>> A657;CYRILLIC SMALL LETTER IOTIFIED A;Ll;0;L;;;;;N;;;A656;;A656
>> 
>> For comparison (ASCII-range characters):
>> 
>> Input: string-to-codepoints('&#x0041;')
>> Output (as expected): 65
>> 
>> Input: string-to-codepoints(lower-case('&#x0041;'))
>> Output (as expected): 97
>> 
>> It looks, then, as if the lower-case() function works properly on some
>> Unicode characters, such as those in the ASCII range, but not on
>> others, such as those in the Cyrillic B range. The Cyrillic B
>> characters have been in Unicode since version 5.1.0 (April 4, 2008);
>> Unicode is now at 6.2.0.  Is this a bug (and if so, whose bug is it?),
>> or are my expectations based on a misunderstanding?
>> 
>> Thanks,
>> 
>> David (djbpitt at gmail.com <mailto:djbpitt at gmail.com>)
>> ------------------------------------------------------------------------
>> 
>> _______________________________________________
>> oXygen-user mailing list
>> oXygen-user at oxygenxml.com
>> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
>>   
> 
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> oXygen-user mailing list
> oXygen-user at oxygenxml.com
> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
> 
> 
> End of oXygen-user Digest, Vol 27, Issue 8
> ******************************************



More information about the oXygen-user mailing list