Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2

Subject: Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2
From: Dimitre Novatchev
Date: Fri, 29 Apr 2016

I am at work and don't have the time for a complete/tested
implementation, but one can use the function string-to-codepoints()
and then perform on the result:

<xsl:for-each-group select="$theCodepoints"

 . . . . . . . .


On Fri, Apr 29, 2016, Eliot Kimber wrote:
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Using XSLT 2, I have a requirement to take text and group contiguous
> sequences of characters in markup according to a given character range the
> characters are in. This is to support the application of range-specific
> fonts to text in HTML.
> I have a static definition of the character ranges for a given national
> language and there shouldn't be any overlap between ranges. Given this
> static definition, I'm generating XSLT code to operate on text nodes in
> order to apply the range markup. The
> For example, given the text string "abcdefg" where range "R1" is "cde" and
> R2 is "g", the marked up result should be: abc<span
> class="R1">cde</span>f<span class="R2">g</span>
> My initial approach is to generate a template that takes the current
> language and the text node and then applies templates in a
> language-specific mode.
> For each language I'm then generating a template to do the range matching.
> My question, once I'm in a language-specific template for a text node,
> what is the most efficient and/or easiest to code way to map the string to
> ranges? Since I'm generating the code it doesn't have to be concise.
> I'm thinking along the lines of using analyze-string to match on any of
> the groups and then within the matching-substring clause have a choice
> group to determine which range actually matched. But it feels like I'm
> missing a more elegant way to determine the actual range.
> Or maybe there's a clearer/simpler/more efficient way using tail recursion?
> Thanks,
> Eliot
