[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Sorting Upper-Case first. Microsoft bug?


Subject: RE: [xsl] Sorting Upper-Case first. Microsoft bug?
From: "John Marshall" <John.Marshall@xxxxxxxxxxxxxx>
Date: Fri, 8 Aug 2003 08:56:07 +0100

Dr. Johnson and every lexicographer since has used case as the least significant, most rapidly varying element in ordering. The example I have in front of me from the Concise Oxford Dictionary lists daily - Dalmatian - dalmatic and I would not expect it to do anything else.

When Dennis Ritchie devised C before 1978, strcmp() would give a sort order that would place Dalmatian first (assuming ASCII) but in those days most of us were still using uppercase-only i/o devices and not worried about such refinements. If we were, we used strcmpi().

The world has moved on and the whole thrust of Unicode is to coerce the mechanical representation of text into natural linguistic usage, so Dr. Johnson wins.

There will be all sorts of interesting issues that arise in considering the natural ordering of words from different linguistic groups, not borrowings like yacht and pyjama, but with equal cultural weight.

I suspect you are in a minority of one and the unanimity of the XSLT processors suggests that the interpretation they have adopted is the correct one.

John Marshall
Accurate Software

80 Peach Street, Wokingham, Berkshire, RG40 1XH, UK.
Tel: +44 (0)118 977 3889
Fax: +44 (0)118 977 1260
http://www.accuratesoftware.com <http://www.accuratesoftware.com>  




-----Original Message-----
From: David Carlisle [mailto:davidc@xxxxxxxxx]
Sent: 07 August 2003 21:40
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: RE: [xsl] Sorting Upper-Case first. Microsoft bug?




   I've been re-reading the W3C Recommendation and although I still think
   the definition for the case-order sorting given in it is misleading to say
   the least, I have to recognise that  if you keep reading the EXAMPLE given
   in it clearly states that  given A,B,a and b the sorting (Upper-case first)
   would be A,a,B,b
      And that's probably why all the implementators followed this rule.

   According to the W3C Recommendation:


But implementers are not following that rule.
The rule given is that the case order affects the ording of characters
and that strings are ordered lexicographically based on that ordering.
This is _not_ what the implementations are doing.

Mike kindly gave the algorithm used in saxon 6. Given the length of time
its been used, I fear its to late to change it, but I can't see any
reading of the W3C rec that could justify such an algorithm.

Well actually there is one reading (as Mike pointed out) you could
assume that sorted lexicographically was being used as a colloquial
turn of phrase rather that specifying lexicographic ordering, but
I don't really see any justification for that (and it certainly never
occured to me before this thread that systems would do that)

David

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list




Accurate Software

info@xxxxxxxxxxxxxxxxxxxx
www.accuratesoftware.com

Europe . North America . Australasia . Africa

The information in this email is confidential and privileged and is intended only for the use of the individual or entity listed above.  If you are neither the intended individual, or entity listed above, nor the person responsible for the delivery of this email to the intended recipients, you are hereby notified that any unauthorised distribution, copying or use of this email is prohibited. If you have received this email in error, please notify the Accurate system manager at postmaster@xxxxxxxxxxxxxxxxxxxx or on +44 (0)118 977 3889.  The views expressed in this communication may not necessarily be the views held by the Accurate Group.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords