[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] for-each-group grouping accented versions of letters together

Subject: Re: [xsl] for-each-group grouping accented versions of letters together
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Sat, 21 Apr 2012 18:24:39 +0100

Gerrit's solution of using normalize-unicode should work. Another solution is to do the grouping using a collation that ignores accents. Unfortunately, collation URIs are processor-dependent - but there are various ways you can parameterize them if you want portability. In Saxon, you would use

<xsl:for-each-group collation="http://saxon.sf.net/collation?ignore-modifiers=yes" ...

More details of Saxon collation URIs are here:


Michael Kay

On 21/04/2012 01:03, Graydon wrote:
So I've got an XML index file, which is too large for some downstream
processing to be entirely pleased with.  The requirement is to split the
file up, grouping index entries (index-0 elements; the index element is
the overall container element) by the first character of their child
heading element.

Using XSLT 2.0, this is pretty easy:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet exclude-result-prefixes="xs xd" version="2.0"
   <xsl:template match="/wkna-shared-cms/index">
     <xsl:for-each-group group-by="substring(heading,1,1)" select="index-0">
       <xsl:sort select="./heading"/>
       <xsl:result-document href="eitaindex+Topical_Index_{current-grouping-key()}.xml">
           <index area="{/wkna-shared-cms/index/@area}"
             <num cite="Topical Index {current-grouping-key()}">
               <xsl:sequence select="current-grouping-key()"/>
             <xsl:copy-of select="/wkna-shared-cms/index/index-metadata"/>
             <xsl:copy-of select="current-group()"/>

The problem is that some of the initial characters of the headings have
accents, and it's desired that the accented characters and the
unaccented characters group together, so that E and I and J, etc. all
group together in a group with a current-grouping-key() of "E".

I can imagine doing this in a painful way with conditional statements
and an exhaustive list of characters, but I'm hoping someone can tell me
there's a better way.


-- Graydon

Current Thread