[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Function for determining one XPath as subset of another

Subject: Re: [xsl] Function for determining one XPath as subset of another
From: "Liam R. E. Quin liam@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 26 Jan 2016 21:35:25 -0000

On Tue, 2016-01-26 at 16:15 +0000, Adam Retter
adam.retter@xxxxxxxxxxxxxx wrote:
> Given two simple XPaths, say:
> 1. //w
> 2. /x/y/z/w[@a = 'v']
>B [...]
> I wondered if there might be an algorithm or library that someone
> already had or has written which might be able to give me the answer?

There's quite a bit of literature on optimizing XPath in the context of
databases (e.g. in proc. VLDB, in "XML-Based Data Management and
Multimedia Engineering" and elsewhere) but I don't know of a single
coherent summary.

This may be at least in part because what's useful and practical
depends on the implementation and the situation.

For example, a system with an element occurrence index might solve //w
in O(n_w) time, where n_w is the number of occurrences of w elements in
the corpus being searched; one using a tree structure might do better
with the explicit path.

A path like //a//b//c//d is much harder using tree navigation, but
rewritten as d[ancestor::c[ancestor::b[ancestor::a]]] against an
element index will likely run in time close to O(n_d log D) where D is
the average depth of the tree. A system smart enough to notice that
there are only three "c" elements in the corpus could do better than
O(n_d) though (my text retrieval system did/does that to speed up
multi-word phrase searches, for example).

We encountered each of these implementation strategies during the
development of XQuery. B The cost of building an element index for a
non-database-backed XPath implementation is in some systems amortized
by doing it during parsing, while the input XML is being read; systems
that run XPath against a stored or in-memory tree might not have that
luxury, or might have the even greater luxury of a precomputed index.

I don't know of anything off the shelf, and although open source XQuery
implementations might be a place to look, it's likely that
optimizations are done on an internal representation of the XPath

> I realise that I can only probably cover a subset of XPath itself,
> but
> it is only the path steps with predicates which I am interested in.
> Ideally I am looking for something in Java.

Current Thread