[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Sorting Upper-Case first. Microsoft bug?


Subject: Re: [xsl] Sorting Upper-Case first. Microsoft bug?
From: "W. Eliot Kimber" <eliot@xxxxxxxxxx>
Date: Sat, 09 Aug 2003 20:38:47 -0400

David Carlisle wrote:

Dr. Johnson and every lexicographer since has used case as the least
significant, most rapidly varying element in ordering. The example I
have in front of me from the Concise Oxford Dictionary lists daily -
Dalmatian - dalmatic and I would not expect it to do anything else.


Dictionaries are not really a good example to follow here as they don't
have to deal with all strings, it probably doesn't list
DAILY or dalmatioN at all, but xsl:sort has to deal with these things.

I haven't seen anyone mention that in the general case it is not possible for any XSLT implementation to define the appropriate collation rules for all possible uses of sort--the variance even within a single language is too great, as evidenced by, for example, the discussion of back-of-the-book index sorting in the _Chicago Manual of Style_. In addition, the Unicode standard is very clear that the ordering of characters in the Unicode character set does not define the collation sequence for any language or writing system. While most alphabetic languages have a natural or default collation order, sylabic and ideographic languages mostly do not.


For example, Simplified Chinese is collated in terms of its pin-yin transliteration. That is, a character transliterated as "pi" would sort under "p". But there is no universal agreement about what the transliteration of every character is--some authorities might transliterate "pi" as "bi", for example.

Not to mention that collation rules could vary within a single document. For example, the index might use one set of rules (for example, ignoring punctuation and spaces) while a generated glossary or parts list respects them.

Any XSLT implementation that does not provide a way for users to easily integrate custom collators will not be useful for a number of important use cases, including producing back-of-the-book indexes. In particular, any application that needs to do culturally- and editorially-appropriate collation in non-Western lanuages (essentially the languages and locales for which Java does not currently provide appropriate Collator implementations) will only be able use XSLT processors that provide a way to specify custom collators.

As far as know, only Saxon provides this facility today (although I haven't looked into MS-XSL's extension facilities since all my work is done in Java).

Cheers,

Eliot
--
W. Eliot Kimber, eliot@xxxxxxxxxx
Consultant, ISOGEN International

1016 La Posada Dr., Suite 240
Austin, TX  78752 Phone: 512.656.4139



XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords