[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Processing two documents, which order?


Subject: Re: [xsl] Processing two documents, which order?
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Fri, 8 Apr 2011 10:49:58 +0200

On 8 April 2011 09:15, Dave Pawson <davep@xxxxxxxxxxxxx> wrote:
>
> > > Given
> > >      <property>absolute-position</property>
> > >      <property>bottom</property>
> > >      <property>left</property>
> > >      <property>right</property>
> > >      <property>top</property>
> > > as the input... what would the keys look like?
> >
>
> The 'list to be marked up' is as above
> The other document is xml, containing, in other elements those words
>
> Required output
>
> <para> Blah blah blah <property>right</property>
>
> 'items' must be followed by [\s\p{{P}}]  so left-handed doesn't get
> marked up  etc.

If, given "left", "left-handed" should not match, the set of stoppers must
include space and non-letters (\PL) and not punctuation characters (\pP).
If a regular expression is used, the pattern may also have to include the
anchor $.

And, possibly the symmetric pattern (using '^') should precede the pattern.

I'm not at all sure whether a regular expression substitution applied to text
nodes in their entirety would not be able to compete with any other approach.
A simple algorith can be used to optimize the regular expression, away from
the "brute force" pattern joining all words with '|'.

Example:
Given the words

   bee-bonnet-bounce-bounty-burn-burst-sea-seal

the optimized and anchored regex is

  (^|\s|\p{P})((?:b(?:ee|o(?:nnet|un(?:ce|ty))|ur(?:n|st))|sea(?:|l)))($|\s|\
p{P})

Here is a text:

   <p>Bee in my bonnet bounces from bounty. Burst on a bee-line into
the sea as a seal</p>

Applying global case-insensitive substitution with $1<x>$2</x>$3 produces:

   <p><x>Bee</x> in my <x>bonnet</x> bounces from <x>bounty</x>.
<x>Burst</x> on a <x>bee</x>-line into the <x>sea</x> as a
<x>seal</x></p>

Disclaimer: My XSLT skills aren't sufficient to create the optimized
regex from the word list. If someone is interested enough, I can
provide the details.

-W

>
>
> regards
>
>
>
>
> --
>
> regards
>
> --
> Dave Pawson
> XSLT XSL-FO FAQ.
> http://www.dpawson.co.uk


Current Thread
Keywords