[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] for-each-group grouping accented versions of letters together


Subject: Re: [xsl] for-each-group grouping accented versions of letters together
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Sat, 21 Apr 2012 03:02:22 +0200

You can strip the accents by unicode decomposition and then removing the diacritical marks:

<xsl:for-each-group select="index-0"
  group-by="substring(
              upper-case(
                replace(
                  normalize-unicode(heading, 'NFKD'),
                  '[&#x300;-&#x36f;]',
                  ''
                )
              ), 1, 1
            )">
  <xsl:sort select="current-grouping-key()"/>

When writing the group (= starting letter) to an output file further down in you template, you should sort it according to the upper-case() part as first sort key, then according to the actual heading as a second (tie-breaker) sort key.

So its best to make a function (call it, e.g., my:sortkey) out of upper-case().

In that function, you can also do other useful stuff, such as eliminating stop words or replacing all numbers with a zero, so that everything that starts with a number will be in the same group.

Gerrit


On 2012-04-21 02:03, Graydon wrote:
So I've got an XML index file, which is too large for some downstream
processing to be entirely pleased with.  The requirement is to split the
file up, grouping index entries (index-0 elements; the index element is
the overall container element) by the first character of their child
heading element.

Using XSLT 2.0, this is pretty easy:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet exclude-result-prefixes="xs xd" version="2.0"
   xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
   xmlns:xs="http://www.w3.org/2001/XMLSchema"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:template match="/wkna-shared-cms/index">
     <xsl:for-each-group group-by="substring(heading,1,1)" select="index-0">
       <xsl:sort select="./heading"/>
       <xsl:result-document href="eitaindex+Topical_Index_{current-grouping-key()}.xml">
         <wkna-shared-cms>
           <index area="{/wkna-shared-cms/index/@area}"
             xml:lang="{/wkna-shared-cms/index/@xml:lang}">
             <num cite="Topical Index {current-grouping-key()}">
               <xsl:sequence select="current-grouping-key()"/>
             </num>
             <xsl:copy-of select="/wkna-shared-cms/index/index-metadata"/>
             <xsl:copy-of select="current-group()"/>
           </index>
         </wkna-shared-cms>
       </xsl:result-document>
     </xsl:for-each-group>
   </xsl:template>
</xsl:stylesheet>

The problem is that some of the initial characters of the headings have
accents, and it's desired that the accented characters and the
unaccented characters group together, so that E and I and J, etc. all
group together in a group with a current-grouping-key() of "E".

I can imagine doing this in a painful way with conditional statements
and an exhaustive list of characters, but I'm hoping someone can tell me
there's a better way.

Thanks!

-- Graydon


-- Gerrit Imsieke Geschdftsf|hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschdftsf|hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vvckler


Current Thread
Keywords