[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Pattern-Matching / Regular Expression Types


Subject: Re: [xsl] Pattern-Matching / Regular Expression Types
From: Dimitre Novatchev <dnovatchev@xxxxxxxxx>
Date: Thu, 26 Apr 2012 15:39:01 -0700

There is a LR-1 generic, table-driven parser in FXSL --anyone is
welcome to use it.

Dimitre.

On Thu, Apr 26, 2012 at 3:28 PM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>
> You could try taking a look at Gunther Rademacher's REX parser generator.
> I've found it hard to find information about it, other than mentions by
> people who have used it for some rather interesting projects. Basically, if
> I understand it correctly, given an EBNF grammar, it generates a parser for
> that grammar written in XQuery. Most of the examples seem to be parsers for
> textual languages (i.e. where the tokens of the language being parsed are
> made up of characters) but I don't see any reason in principle why it
> shouldn't also parse a language where the tokens of the language are
element
> nodes.
>
> Michael Kay
> Saxonica
>
>
> On 26/04/2012 22:45, Tiago Freitas wrote:
>>
>> I need to match patterns on a set of XML documents (all with the same
>> schema), and when a pattern matches, I need to retrieve the content
>> and do some specific transformations on that content (no xml output
>> needed).
>>
>> Specifically, they are natural language syntactic trees (and
>> dependencies).
>>
>> I will have a list of those "patterns", that are similar to regular
>> expressions, but with elements and attributes.
>>
>> pseudo-pattern example:
>>
>> (//ELEMENTx) (node())* (//ELEMENTy[@ATTRIBUTEz]) (node())*
>> (//@ATTRIBUTEw)
>>
>> I used XPath syntax inside the parenthesis only. Other quantifiers
>> could be used...and also specify dependencies between
>> nodes/attributes, but that is another problem.
>>
>> This example would match when the xml has ELEMENTx as the first
>> element, ends with one element that has ATTRIBUTEw, and in between
>> needs to have an ELEMENTy with ATTRIBUTEz.
>>
>> Note that I need to match the whole document for each pattern, not
>> just part of it.
>>
>> The nesting of elements does not matter in this case (ELEMENTy could
>> be a child of ELEMENTx, or not), but they need to have that specific
>> order (in document order).
>>
>> Example of tree that can appear:
>> TOP
>> B  / \
>> X B  Y
>> | \ B  | \
>> 1 2 3 4
>>
>> Matching patterns could be (node names, assuming no attributes):
>> X Y
>> 1 * Y
>> X 3 4
>> 1 * 4
>>
>> I could use XPath to get each individual node in the pattern, but then
>> I loose the order...if I do two XPath queries, I don't know the
>> positions of the results relative to each other.
>>
>> After matching, I will have rules for each pattern, that specify some
>> transformations on the content (change order, etc).
>>
>> Is there any way to do something like this using XSL, XQuery, or other
>> language? (preferably available in a Java implementation)
>>
>> Thanks for any pointers.
>> (Is it ok to cross-post this to an XQuery list? Recommend any?)
>



--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.


Current Thread
Keywords