[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Sorting Upper-Case first. Microsoft bug?


Subject: RE: [xsl] Sorting Upper-Case first. Microsoft bug?
From: David.Pawson@xxxxxxxxxxx
Date: Tue, 12 Aug 2003 14:18:57 +0100

With more than a little help from Eliot, below is a means of providing
your own sort order.
Its Saxon specific, tested with 6.5.2. Sorry.

xslt: 


 <xsl:for-each select="word">
    <xsl:sort select="."  
     data-type="text"
     lang="er"/>
    <xsl:value-of select="."/> <br />
  </xsl:for-each>

Note the 'lang' attribute.
  This sends saxon off looking for a class
com.icl.saxon.sort.Compare_er 

which provides the necessary items.
(note, the default is _en, the english 'sort')

For convenience, the actual sort order is kept externally,
read in at runtime from a text file, as utf-8

For this test case it reads,

  ' ' ,  ':' ,  ';' ,  '<'  , '=' ,  '>'  , '?' ,  '@', '!',
 '[' ,  '\' ,  ']' ,  '^' ,  '_' ,  '`',
 '{' ,  '|' ,  '}' ,  '~'
 '!' ,  '"' ,  '#' ,  '$' ,  '%' ,  '&',  ''' ,  '(' ,  ')' ,  '*'  , '+'  ,
','  , '-' , '.' ,  '/'

< 'A',a  < 'B',b  < 'C',c  < 'D',d  < 'E',e  < 'F',f  < 'G',g
< 'h' < 'H'  < 'I',i  < 'J',j  < 'K',k  < 'L',l  < 'N',n < 'M',m
< 'O',o  < 'P',p  < 'Q',q  < 'R',r  < 'S',s  < 'T',t
< 'U',u < ü  < 'V',v  < 'W',w  < 'X',x  < 'Y',y  < 'Z',z
< '0'  < '1'  < '2'  < '3'  < '4'  < '5'  < '6'  < '7'  < '8'  < '9'


Two basic blocks. First up to the first < sign.
These are ignorable. 
Next the 'sequence' of characters, e.g. < 'A',a  < 'B',b
implying that A sorts before B.

Note < 'N',n < 'M',m
which is the test case. I.e. N should sort before M.
  
This lot is held in file collator.txt
Spec is
http://java.sun.com/products/jdk/1.2/docs/api/java/text/RuleBasedCollator.ht
ml 


The java is below.

package com.icl.saxon.sort;

import java.text.Collator;
import java.text.RuleBasedCollator;
import java.text.ParseException;
import java.lang.StringBuffer;
import java.io.FileReader;
import java.io.BufferedReader;


import java.io.Serializable;
import com.icl.saxon.sort.TextComparer;
import java.io.File;



/**
  * Custom Saxon collator implementation.
  **/


public class Compare_er extends com.icl.saxon.sort.TextComparer {

     static Collator collator;

    //static final String collatorRules = "< a < b < c";
     // String containing collation rules as defined by Java
     // RulesBasedCollator class. This could come from an
     // external resource of some sort, including from a Java
     // property or read from an application-specific configuration
     // file.

     public  Compare_er() {
         super();
	 String rulesFile="collator.txt";
         try {
             collator = new RuleBasedCollator(getRules(rulesFile));
         } catch (Exception e) {
             e.printStackTrace(); // Saxon will not report an exception
thrown at this point
         }
     }

     public int compare(java.lang.Object obj, java.lang.Object obj1) {
         return collator.compare(obj, obj1);
     }

 /**
     *Read a set of rules into a String
     *@param filename  name of the file containing the rules
     *@return String, the rules
     *
     **/
    private static String getRules(String filename) {
	String res="";
	try{
	    BufferedReader reader = 
		new BufferedReader (new FileReader (filename));
	    StringBuffer buf=new StringBuffer();
	    String text;
	    try {
		while ((text=reader.readLine()) != null)
		    buf.append(text + "\n");
		reader.close();
	    }catch (java.io.IOException e) {
		System.err.println("Unable to read from rules file "+
filename);
		System.exit(2);
		
		}
	    res=buf.toString();


	}catch (java.io.FileNotFoundException e) {
	    System.out.println("Unable to read Rules file, quitting");
	    System.exit(2);
	}
	
	return res;
    }// end of getRules()

}

Note the read from the rules file.
 (Also that if its not found, saxon doesn't report the error)


Finally, testing with 

<doc>
 <word>Hello</word>
 <word>hello</word>
 <word>[hello]</word>
<word>Mword</word>
<word>Nword</word>
</doc>

gives output

Hello
hello
[hello]
Nword
Mword

I.e. the M and N are re-arranged.

Caution.

Assuming that the java file is in location
com/icl/saxon/sort/Compare_er.java

then make sure that '.' is in the classpath, so it finds it.

With your own collator.txt file you can then sort text to your
hearts content and to your own rules.

Its even easier in saxon 7, but that's another story.

HTH DaveP

- 

NOTICE: The information contained in this email and any attachments is 
confidential and may be legally privileged. If you are not the 
intended recipient you are hereby notified that you must not use, 
disclose, distribute, copy, print or rely on this email's content. If 
you are not the intended recipient, please notify the sender 
immediately and then delete the email and any attachments from your 
system.

RNIB has made strenuous efforts to ensure that emails and any 
attachments generated by its staff are free from viruses. However, it 
cannot accept any responsibility for any viruses which are 
transmitted. We therefore recommend you scan all attachments.

Please note that the statements and views expressed in this email 
and any attachments are those of the author and do not necessarily 
represent those of RNIB.

RNIB Registered Charity Number: 226227

Website: http://www.rnib.org.uk 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread