[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Function for determining one XPath as subset of another

Subject: Re: [xsl] Function for determining one XPath as subset of another
From: "Adam Retter adam.retter@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 21 Feb 2016 19:59:59 -0000

Thanks to everyone for their comments and suggestions :-) I am
cross-posting from xquery-talk my results for anyone who is

Okay so for those that are interested, here is the solution that I
ended up with:

1) I wrote my own XPath 2 parser as I wanted a simplified AST to start
operating from. The project I am working on needs to be under GPL2 and
I could not find a decent Java library that was compatible with that
license - https://github.com/exquery/xpath2-parser/

2) I then wrote some utility functions for descending through the Path
expressions and making subset comparisons -
Yes, this has many limitations and is most likely not complete yet,
but it serves me well for the small subset of XPath path expressions
that I want to support.

3) You can see in my test-cases how I consider one path expression as
a subset of another path expression:

On 26 January 2016 at 21:35, Liam R. E. Quin liam@xxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 2016-01-26 at 16:15 +0000, Adam Retter
> adam.retter@xxxxxxxxxxxxxx wrote:
>> Given two simple XPaths, say:
>> 1. //w
>> 2. /x/y/z/w[@a = 'v']
>> [...]
>> I wondered if there might be an algorithm or library that someone
>> already had or has written which might be able to give me the answer?
> There's quite a bit of literature on optimizing XPath in the context of
> databases (e.g. in proc. VLDB, in "XML-Based Data Management and
> Multimedia Engineering" and elsewhere) but I don't know of a single
> coherent summary.
> This may be at least in part because what's useful and practical
> depends on the implementation and the situation.
> For example, a system with an element occurrence index might solve //w
> in O(n_w) time, where n_w is the number of occurrences of w elements in
> the corpus being searched; one using a tree structure might do better
> with the explicit path.
> A path like //a//b//c//d is much harder using tree navigation, but
> rewritten as d[ancestor::c[ancestor::b[ancestor::a]]] against an
> element index will likely run in time close to O(n_d log D) where D is
> the average depth of the tree. A system smart enough to notice that
> there are only three "c" elements in the corpus could do better than
> O(n_d) though (my text retrieval system did/does that to speed up
> multi-word phrase searches, for example).
> We encountered each of these implementation strategies during the
> development of XQuery.  The cost of building an element index for a
> non-database-backed XPath implementation is in some systems amortized
> by doing it during parsing, while the input XML is being read; systems
> that run XPath against a stored or in-memory tree might not have that
> luxury, or might have the even greater luxury of a precomputed index.
> I don't know of anything off the shelf, and although open source XQuery
> implementations might be a place to look, it's likely that
> optimizations are done on an internal representation of the XPath
> expression.
>> I realise that I can only probably cover a subset of XPath itself,
>> but
>> it is only the path steps with predicates which I am interested in.
>> Ideally I am looking for something in Java.

Adam Retter

skype: adam.retter
tweet: adamretter

Current Thread