[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Processing two documents, which order?


Subject: Re: [xsl] Processing two documents, which order?
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Sat, 9 Apr 2011 19:41:00 +0200

This is about the definition of a character class, using '['and ']',
defining a set of characters that match a single character if it is
one from the defined set.

The definition of a class can be done by a union of single characters
and ranges, and by subtracting classes from a union.

Within [ and ], some characters have special meaning and must be
escaped (using '\') in order to be taken literally.

The hyphen ('-') permits you to define ranges of characters, but it is
also used as an operator for the substraction of sets.

It is possible to define sets within sets, but only for the purpose of
creating the second operand of a set subtraction.

Looking at [a-z\--\-\-:], we see
   a-z ... a range of 26 characters
   \--\- ... a range of a single character, '-'
   \-     ... a single character, '-'
   :      ... a single character, ':'

The simpler class [a-z\--:] (no spaces are permitted, and neither is a
backslash in front of the colon, because it's not a valid single
character escape) has been analyzed by Liam; it is the union of two
ranges, including lowercase letters, the digits and the period, the
solidus, the hyphen and the colon.

Among other issues, this thread deals with the question of finding
certain words that are delimited by anything except a hyphen: if
"left" and "hand" should be found, "left-hand" should not, unless it
is itself included in the list of words. Thus, it is sufficient to
include the hyphen in the set of characters to match for a word. Thus:

   regex="[a-z][a-z\-]+"

(The colon is more difficult.)

-W


On 9 April 2011 17:55, Liam R E Quin <liam@xxxxxx> wrote:
> On Sat, 2011-04-09 at 08:20 +0100, Dave Pawson wrote:
>
>> I want to say any lc character, AND not( : | -)
>
> since : and - are not lowercase characters, just "any lowercase letter"
> would work... or by AND do you mean "followed by"?
>
>> <xsl:analyze-string select="." regex="[a-z][a-z\--\-\-:]+">
>> works. But I don't know how.
>
> [a-z] is a lower case letter (in ASCII...)
> [a-z \- - \:] allows any character in two ranges:
> (1) a .. z
> (2) - .. :
> using the default collation/sorting sequence, this gives (consulting an
> ASCII or Unicode chart)
> - . / 0123456789 :
>
> This therefore matches pastry:36-little-pigs but not flat:pan_cake
>>
>> [a-z-[p]] excepts p from the range a-z
>> Is this connected with my misunderstanding?
>
> It might be, but there are no nested square brackets in your example.
> The stylesheet you appended had the range --- in it, rather than --: by
> the way.
>
> Note that we are using here XPath 2 regular expressions, not Java ones.
> They are very close (and both are more or less subsets of Perl regular
> expressions, which are much more powerful).
>
> Liam
>
>
> --
> Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
> Pictures from old books: http://www.fromoldbooks.org/
> Occasional blog: http://www.barefootliam.org/
> The barefoot typographer


Current Thread
Keywords