[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2

Subject: Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 29 Apr 2016 18:38:05 -0000

I have my generated analyze-text approach working generally. However, some
of my regular expressions are not matching when I would expect them to.

For example, given this @regex value:


And this text:


The regular expression does not match, even though the first group clearly
matches on \uA9 and \uAE.

However, this text:


does match (second group).

If I copy the entire regex or any group from the @regex value and try it
in Oxygen against the same text I get the expected matches.

Have I made a stupid syntax mistake in my regular expression? Is there
some subtlety to matching groups that makes XSLT different from what
Oxygen is doing? I can't see any obvious syntax error in the regular



Eliot Kimber, Owner
Contrext, LLC

On 4/29/16, 11:54 AM, "Eliot Kimber ekimber@xxxxxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>I see how that can work.
>Eliot Kimber, Owner
>Contrext, LLC
>On 4/29/16, 11:38 AM, "Dimitre Novatchev dnovatchev@xxxxxxxxx"
><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>I am at work and don't have the time for a complete/tested
>>implementation, but one can use the function string-to-codepoints()
>>and then perform on the result:
>><xsl:for-each-group select="$theCodepoints"
>> . . . . . . . .
>>On Fri, Apr 29, 2016 at 8:04 AM, Eliot Kimber ekimber@xxxxxxxxxxxx
>><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>> Using XSLT 2, I have a requirement to take text and group contiguous
>>> sequences of characters in markup according to a given character range
>>> characters are in. This is to support the application of range-specific
>>> fonts to text in HTML.
>>> I have a static definition of the character ranges for a given national
>>> language and there shouldn't be any overlap between ranges. Given this
>>> static definition, I'm generating XSLT code to operate on text nodes in
>>> order to apply the range markup. The
>>> For example, given the text string "abcdefg" where range "R1" is "cde"
>>> R2 is "g", the marked up result should be: abc<span
>>> class="R1">cde</span>f<span class="R2">g</span>
>>> My initial approach is to generate a template that takes the current
>>> language and the text node and then applies templates in a
>>> language-specific mode.
>>> For each language I'm then generating a template to do the range
>>> My question, once I'm in a language-specific template for a text node,
>>> what is the most efficient and/or easiest to code way to map the string
>>> ranges? Since I'm generating the code it doesn't have to be concise.
>>> I'm thinking along the lines of using analyze-string to match on any of
>>> the groups and then within the matching-substring clause have a choice
>>> group to determine which range actually matched. But it feels like I'm
>>> missing a more elegant way to determine the actual range.
>>> Or maybe there's a clearer/simpler/more efficient way using tail
>>> Thanks,
>>> Eliot
>>> ----
>>> Eliot Kimber, Owner
>>> Contrext, LLC
>>> http://contrext.com
>>Dimitre Novatchev
>>Truly great madness cannot be achieved without significant intelligence.
>>To invent, you need a good imagination and a pile of junk
>>Never fight an inanimate object
>>To avoid situations in which you might make mistakes may be the
>>biggest mistake of all
>>Quality means doing it right when no one is looking.
>>You've achieved success in your field when you don't know whether what
>>you're doing is work or play
>>To achieve the impossible dream, try going to sleep.
>>Facts do not cease to exist because they are ignored.
>>Typing monkeys will write all Shakespeare's works in 200yrs.Will they
>>write all patents, too? :)
>>Sanity is madness put to good use.
>>I finally figured out the only reason to be alive is to enjoy it.

Current Thread