[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Pattern-Matching / Regular Expression Types


Subject: Re: [xsl] Pattern-Matching / Regular Expression Types
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 26 Apr 2012 17:59:32 -0400

Tiago,

I don't know if this will help much, but I have seen similar problems solved by writing the XML document structure as a string, and evaluating it with regular expressions (which might themselves be generated dynamically from configurations managed in another format).

This is a powerful method, but one disadvantage is that it isn't very sensitive: it is able to report that a document fails to match the constraints so expressed, but not where or why.

It's an interesting problem so I look forward to seeing other responses.

A good XQuery list is this one:

talk@xxxxxxxxxxx
http://x-query.com/mailman/listinfo/talk

Cheers,
Wendell

On 4/26/2012 5:45 PM, Tiago Freitas wrote:
I need to match patterns on a set of XML documents (all with the same
schema), and when a pattern matches, I need to retrieve the content
and do some specific transformations on that content (no xml output
needed).

Specifically, they are natural language syntactic trees (and dependencies).

I will have a list of those "patterns", that are similar to regular
expressions, but with elements and attributes.

pseudo-pattern example:

(//ELEMENTx) (node())* (//ELEMENTy[@ATTRIBUTEz]) (node())* (//@ATTRIBUTEw)

I used XPath syntax inside the parenthesis only. Other quantifiers
could be used...and also specify dependencies between
nodes/attributes, but that is another problem.

This example would match when the xml has ELEMENTx as the first
element, ends with one element that has ATTRIBUTEw, and in between
needs to have an ELEMENTy with ATTRIBUTEz.

Note that I need to match the whole document for each pattern, not
just part of it.

The nesting of elements does not matter in this case (ELEMENTy could
be a child of ELEMENTx, or not), but they need to have that specific
order (in document order).

Example of tree that can appear:
TOP
   / \
X   Y
| \   | \
1 2 3 4

Matching patterns could be (node names, assuming no attributes):
X Y
1 * Y
X 3 4
1 * 4

I could use XPath to get each individual node in the pattern, but then
I loose the order...if I do two XPath queries, I don't know the
positions of the results relative to each other.

After matching, I will have rules for each pattern, that specify some
transformations on the content (change order, etc).

Is there any way to do something like this using XSL, XQuery, or other
language? (preferably available in a Java implementation)

Thanks for any pointers.
(Is it ok to cross-post this to an XQuery list? Recommend any?)



-- ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================


Current Thread
Keywords