[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] Comparing documents: what of P is a subset of D?

Subject: [xsl] Comparing documents: what of P is a subset of D?
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Thu, 27 Feb 2014 09:05:06 +0100

The data model for a set of similarly (but not identically) built XML
documents is: a collection of arrays of records, which may contain
(recursively) arrays, records and scalars. (The terms "array" and
"record" are used in their "classic" meaning as, e.g., in Pascal.)
Document structures are fairly stable, but they do change over time.
Array elements are identified (indexed) by @_ix, not by position.
Record fields can be elements or attributes (when they are scalar).
Order is undefined, since XPaths plus @_Ix's pinpoint each node.

One XML document D contains a full population for such a data set
(O(1MB)). A second XML document P contains "patches", i.e., each node
appearing in P is expected to be in D as well.

If S(P) is the sequence of nodes (annotated with their XPaths) in P
and S(D) the one with nodes from D, how can I determine S(P) intersect
S(D) (except all @_ix, whose values are bound to be identical)? Of
course, I don't want the common set of *data items* - I want the XML
paths of those common data items.

A solution (in XSLT 2.0) should not need individual adaption for each
kind of data set.

I'm confident that I can create text files for D and P containing one
line <path> <value> for each node and run diff (after sort).

Any better ideas?


Current Thread