[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2


Subject: Re: [xsl] Implementation Advice: Grouping Strings by Character Range in XSLT 2
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 29 Apr 2016 20:05:36 -0000

Doh!

I'm so used to using regular expressions in matches() and replace() that I
just assumed the need for the quotes.

Works as it expected now.

Cheers,

E.
----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 4/29/16, 2:44 PM, "G. Ken Holman g.ken.holman@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>Don't kick yourself too hard:  the regex attribute is not a string
>value, it is a manifest value.  Just take out the surrounding single
>quotes.
>
>I hope this helps.
>
>. . . . . . Ken
>
>At 2016-04-29 18:38 +0000, Eliot Kimber ekimber@xxxxxxxxxxxx wrote:
>>I have my generated analyze-text approach working generally. However,
>>some
>>of my regular expressions are not matching when I would expect them to.
>>
>>For example, given this @regex value:
>>
>>
>>regex="'([&#xa9;&#xae;&#x2120;&#x2122;]+)|([&#xa6;&#xb2;&#xb3;&#xb9;&#xbc
>>;&
>>#xbd;&#xbe;&#xd0;&#xd7;&#xdd;&#xde;&#xf0;&#xfd;&#xfe;&#x160;&#x161;&#x220
>>2;
>>&#x220f;&#x2211;&#x2212;&#x222b;&#x2260;&#x2264;&#x2265;]+)|([&#x27a4;]+)
>>'"
>> >
>>
>>And this text:
>>
>>"&#x00A9;&#x00AE;"
>>
>>The regular expression does not match, even though the first group
>>clearly
>>matches on \uA9 and \uAE.
>>
>>
>>However, this text:
>>
>>"&#x00DD;&#x00DE;"
>>
>>does match (second group).
>>
>>If I copy the entire regex or any group from the @regex value and try it
>>in Oxygen against the same text I get the expected matches.
>>
>>Have I made a stupid syntax mistake in my regular expression? Is there
>>some subtlety to matching groups that makes XSLT different from what
>>Oxygen is doing? I can't see any obvious syntax error in the regular
>>expression.
>>
>>Thanks,
>>
>>Eliot
>>
>>
>>----
>>Eliot Kimber, Owner
>>Contrext, LLC
>>http://contrext.com
>>
>>
>>
>>
>>On 4/29/16, 11:54 AM, "Eliot Kimber ekimber@xxxxxxxxxxxx"
>><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> >Dimitre,
>> >
>> >I see how that can work.
>> >
>> >Cheers,
>> >
>> >E.
>> >----
>> >Eliot Kimber, Owner
>> >Contrext, LLC
>> >http://contrext.com
>> >
>> >
>> >
>> >
>> >On 4/29/16, 11:38 AM, "Dimitre Novatchev dnovatchev@xxxxxxxxx"
>> ><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> >
>> >>I am at work and don't have the time for a complete/tested
>> >>implementation, but one can use the function string-to-codepoints()
>> >>and then perform on the result:
>> >>
>> >><xsl:for-each-group select="$theCodepoints"
>> >>group-adjacent=f:codepointToRange(.)>
>> >>
>> >> . . . . . . . .
>> >></xsl:for-each-group>
>> >>
>> >>Cheers,
>> >>Dimitre
>> >>
>> >>On Fri, Apr 29, 2016 at 8:04 AM, Eliot Kimber ekimber@xxxxxxxxxxxx
>> >><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> >>> Using XSLT 2, I have a requirement to take text and group contiguous
>> >>> sequences of characters in markup according to a given character
>>range
>> >>>the
>> >>> characters are in. This is to support the application of
>>range-specific
>> >>> fonts to text in HTML.
>> >>>
>> >>> I have a static definition of the character ranges for a given
>>national
>> >>> language and there shouldn't be any overlap between ranges. Given
>>this
>> >>> static definition, I'm generating XSLT code to operate on text
>>nodes in
>> >>> order to apply the range markup. The
>> >>>
>> >>> For example, given the text string "abcdefg" where range "R1" is
>>"cde"
>> >>>and
>> >>> R2 is "g", the marked up result should be: abc<span
>> >>> class="R1">cde</span>f<span class="R2">g</span>
>> >>>
>> >>> My initial approach is to generate a template that takes the current
>> >>> language and the text node and then applies templates in a
>> >>> language-specific mode.
>> >>>
>> >>> For each language I'm then generating a template to do the range
>> >>>matching.
>> >>>
>> >>> My question, once I'm in a language-specific template for a text
>>node,
>> >>> what is the most efficient and/or easiest to code way to map the
>>string
>> >>>to
>> >>> ranges? Since I'm generating the code it doesn't have to be concise.
>> >>>
>> >>> I'm thinking along the lines of using analyze-string to match on
>>any of
>> >>> the groups and then within the matching-substring clause have a
>>choice
>> >>> group to determine which range actually matched. But it feels like
>>I'm
>> >>> missing a more elegant way to determine the actual range.
>> >>>
>> >>> Or maybe there's a clearer/simpler/more efficient way using tail
>> >>>recursion?
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Eliot
>> >>> ----
>> >>> Eliot Kimber, Owner
>> >>> Contrext, LLC
>> >>> http://contrext.com
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >>--
>> >>Cheers,
>> >>Dimitre Novatchev
>> >>---------------------------------------
>> >>Truly great madness cannot be achieved without significant
>>intelligence.
>> >>---------------------------------------
>> >>To invent, you need a good imagination and a pile of junk
>> >>-------------------------------------
>> >>Never fight an inanimate object
>> >>-------------------------------------
>> >>To avoid situations in which you might make mistakes may be the
>> >>biggest mistake of all
>> >>------------------------------------
>> >>Quality means doing it right when no one is looking.
>> >>-------------------------------------
>> >>You've achieved success in your field when you don't know whether what
>> >>you're doing is work or play
>> >>-------------------------------------
>> >>To achieve the impossible dream, try going to sleep.
>> >>-------------------------------------
>> >>Facts do not cease to exist because they are ignored.
>> >>-------------------------------------
>> >>Typing monkeys will write all Shakespeare's works in 200yrs.Will they
>> >>write all patents, too? :)
>> >>-------------------------------------
>> >>Sanity is madness put to good use.
>> >>-------------------------------------
>> >>I finally figured out the only reason to be alive is to enjoy it.
>> >>
>> >>
>> >
>> >
>>
>
>
>--
>Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
>Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
>Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/s/ |
>G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:gkholman@xxxxxxxxxxxxxxxxxxxx |
>Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
>Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |
>
>
>---
>This email has been checked for viruses by Avast antivirus software.
>https://www.avast.com/antivirus


Current Thread
Keywords