[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Issue with xsl:sort - does not differ between 'V' and ''W'?


Subject: RE: [xsl] Issue with xsl:sort - does not differ between 'V' and ''W'?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 2 Feb 2005 09:34:43 -0000

> I found some interesting reading on this here:
> http://www-124.ibm.com/icu/userguide/Collate_Intro.html
>

Yes: it rather reinforces my point. For example, it says

"in traditional German "d" is compared as if it were "ae"."

and then

"For example, in German dictionaries, "vf" would come before "of""

I don't think there's been a German dictionary published in the last 50
years where this was true. Look in any modern dictionary, and Schaf comes
before Schdflein. When I looked into this, I found they got the rules from
Duden, the German equivalent of the OED, the kind of publication that takes
50 years to bring out a new edition. They are simply wrong. It's like
treating I and J (or U and V) as the same letter in English - respectable
practice until about 1850, but totally obsolete today.

Java has apparently changed its collation rules for Swedish in JDK 1.5 - I
noticed because quite a few of my collation tests report spurious failures.
Perhaps the new rules are more up-to-date with modern practice; perhaps they
achieve better results for Norwegian too.

(To be fair, this kind of thing is very hard to get right. The Microsoft UK
spellchecker throws out "-ize" endings, despite the fact that the OED
prefers them. "-ize" is now used only by authors who are either (a) very
old-fashioned, or (b) trying to look American. What's a poor software vendor
to do - or even a rich one?)

> I've tried setting the 'lang' attribute to influence this behaviour,
> using my own class com.icl.saxon.sort.Compare_no (I am in Norway). I
> have to mess some more with setting the strength on the Collator
> object in my sort class before I know if this worked :)
>

I'd encourage you to see whether you can get the results you need using the
more sophisticated facilities for collations in Saxon 8.x. These are
available without you having to write your own Java code. Also, these don't
only affect sorting, they also affect string comparison.

Michael Kay
http://www.saxonica.com/


Current Thread