[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: AW: [xsl] Sorting Upper-Case first. Microsoft bug?


Subject: RE: AW: [xsl] Sorting Upper-Case first. Microsoft bug?
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 6 Aug 2003 08:39:07 +0100

> *It would be interesting to know how Saxon implements
> this behaviour..* if M. Kay will be kind to answer..
> 

I thought you would never ask. I'm an optimist ;-)

The answer is different for Saxon 6.x and Saxon 7.x.

In Saxon 6.x, you can write your own collating functions as a plug-in,
but if you don't, then two strings are compared as follows:
  
  1. The two strings are compared with case normalized and accents
stripped, using Unicode codepoint order of the normalized characters.

  2. If step (1) finds that the strings are equal, they are compared
with case normalized but without accents stripped, again using codepoint
order.

  3. If step (2) finds that the strings are equal, the outcome depends
on the case of the first character that differs in the two strings,
taking account of the case-order option on xsl:sort.

Case normalization relies on the Java method toLowerCase. Accent
stripping is implemented only for characters in the upper half of the
Latin-1 set.

The above is essentially a simplified implementation of the Unicode
Collation Algorithm.

In Saxon 7.x, Saxon uses the collation capabilities of JDK 1.4. You can
select any collation supported by the JDK. The default is selected
according to your locale, or according to the language if lang is
specified on xsl:sort. If case-order is upper-first, then the action of
the selected Java collation is modified as follows: if the Java
collation decides that two strings collate as equal, then Saxon examines
the two strings, looking for the first character that differs between
the two strings. If one of these is upper case, then that string comes
first in the sorted order.

Michael Kay



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords