[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] XSLT sorting for index alphabetization

Subject: [xsl] XSLT sorting for index alphabetization
From: "David Sewell dsewell@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 13 Jun 2014 21:05:48 -0000

Recently I needed to merge several back-of-the-book indexes that were marked up in XML. After experimenting a bit, I decided that given an appropriate collation, the following sequence of (XSLT 3.0) xsl:sort instructions was an adequate approximation of what indexers call the "letter-by-letter" style of alphabetizing:

<xsl:sort select="replace(., '[\s\p{P}-[(,]]', '') ! replace(., ',.*|\(.*','')"/>
<xsl:sort select="matches(., '^[^(]+,')"/>
<xsl:sort select="replace(., '[\s\p{P}-[(,]]+', '')"/>

If anyone wants to test it out with the examples that are used in the Chicago Manual of Style to illustrate the system, I've put the full script, data, and relevant chunk of the CMS up here: http://lister.ei.virginia.edu/~drs2n/alpha/ . (Suggested refinements/improvements would be welcome.)

I spent a bit of time trying to figure out how one might implement the word-by-word system (described at the above URL) using xsl:sort, but I'm not sure it's possible--it seems that word-by-word would require a full-blown recursive sorting routine. I'm happy to be proven wrong, though, by anyone who has tackled this before or is cleverer than I am about such things.


David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: dsewell@xxxxxxxxxxxx   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/

Current Thread