[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Question on duplicate node elimination


Subject: Re: [xsl] Question on duplicate node elimination
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Mon, 23 Aug 2010 14:36:38 +0200

Michael,

thanks for the ideas.

> The other - considerably easier - approach would be to write the
> interpreter in Javascript.

This sounds interesting. I once used Javascript to provide the
capability to process embedded stylesheets to IE browsers [1].
It is not that difficult to write Javascript that does XSLT processing
in the same way for the major browser this way.

But I did a quick seach on "Javascript XPath parser" and despite many
hits it seems that nothing complete is available yet.
Even [2] seems far away from being complete (//b/parent::* is fine
for the Sample Application, but //b/.. results in a error ...).


[1] http://stamm-wilbrandt.de/en/xsl-list/ApplyEmbeddedStylesheet.xsl
[2]
http://xmljs.sourceforge.net/website/contributedAddOns-xpath-sampleApplications.html



Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       08/23/2010 11:58 AM
Subject:    Re: [xsl] Question on duplicate node elimination



Looking at your code, I can see why you are stuck. But I'm afraid my
answer is that to get to your desired destination "I wouldn't start from
here". The assumption that you can parse the XPath expression from left
to right and evaluate it as you parse it is just fundamentally flawed,
and your problem with duplicate elimination is just the first point
where you have discovered this.

That leaves the question, how would I advise tackling this problem?
Well, I'd start by allocating rather more time for it than I have
available. Using the interpreter design pattern the normal thing would
be to start by parsing to generate an abstract syntax tree (Jeni's code
seems to do much of this for you - she's a she, by the way). Then
conventionally you write an interpreter that walks the nodes of this
tree treating each node as an operator to be evaluated by evaluating its
sub-expressions and returning a result. The problem here is that you if
you want to write this in XSLT 1.0, XSLT 1.0 templates don't give you
the opportunity to return node-sets (even with the exslt:node-set()
function, you can only return newly constructed nodes, not nodes from
the input document). To get round that problem, you can probably use the
FXSL approach: pass the template a parameter representing a function to
be applied to the selected nodes. But it's going to be tough, and I
don't have the head for it this morning.

The other - considerably easier - approach would be to write the
interpreter in Javascript.

Michael Kay
Saxonica

On 23/08/2010 09:23, Hermann Stamm-Wilbrandt wrote:
>> What makes you think it would be difficult?
>>
> It looks at least not obvious to me and I thought I better ask here
> before reinventing the wheel.
>
>
>> Of course, a processor needs some way to decide whether two nodes are
>> identical/distinct. Given such a mechanism, it's not difficult to come
>> up with algorithms that eliminate duplicate nodes.
>>
> The question is how this mechanism might look like in XPath 1.0 plus
> exslt:node-set function, see below.
>
>
>> In practice, when XPath 1.0 is used as part of XSLT 1.0, the XPath
>> requirement to eliminate duplicates can always be combined with the XSLT
>> requirement to deliver the node-set sorted in document order.
>>
> OK, in thread [1] it was asked how to dynamically evaluate XPath
> expressions.
> Dimitre's posting [2] of a nametest only single node result XSLT 1.0
> solution and the enhancement of that to handle result sets in [3] made me
> start my own XPath evaluator 10 days ago [4]. While it is complete on the
> Location step side it has near zero support for predicates and cannot
> handle duplicate node elimination (last sample).
>
> In posting [5] I learned that XSLTForm project has its own built in (not
> complete) XPath evaluator. Editing file xpath.xml allows to test some
> XPath expressions and it turned out that this XPath evaluator is not able
> to do duplicate node elimination as well.
>
> Searching Google for "XPath Parser" did not help. But searching the list
> archives of xsl-list I found the very interesting posting [6] of Jeni
> Tenison on his XPath parser xpath.xsl generating a parse tree and
> xpath-doc.xsl generating an explanation of a XPath statement.
>
> His xpath.xsl seems to be pretty complete (as an indicator the call graph
> for the named templates [7], generated by stylesheet [10] with
dot/graphwiz
> [9] seems to match the spec quite good).
>
> So my idea was to just start with xpath-doc.xsl and instead of generating
> an explanation make it do the actual XPath evaluation instead.
> And this is the point where duplicate node elimination is needed.
>
>
>> So the
>> natural way to eliminate duplicates is as part of the sorting process.
>>
> Which sorting process -- a global one or the one for the incremental
steps
> in working on the XPath parse tree?
>
> ANY help is really appreciated!
>
>
> Take this XPath as a more complex example (the duplicate nodes may exist
> on any level in the input ducument):
> //c/ancestor::*
>
> Jeni's xpath-doc.xsl generates this explanation:
>    the root node's descendant<span class="name">c</span>  elements's
> ancestor
>    elements
>
> And his xpath.xsl generates this parse tree:
> <path>
>    <root/>
>    <step axis="descendant-or-self">
>      <node/>
>    </step>
>    <step axis="child">
>      <element name="c" namespace="default"/>
>    </step>
>    <step axis="ancestor">
>      <element namespace="any"/>
>    </step>
> </path>
>
>
> Having a solution would enable the major browsers (with the technique of
> David Carlisle [8]) to dynamically evaluate Xpath expressions without
> having a dyn:evaluate() implementation.
>
>
> [1]
>
http://stackoverflow.com/questions/3015942/retrieving-xml-node-from-a-path-specified-in-an-attribute-value-of-another-node

> [2]
>
http://stackoverflow.com/questions/3015942/retrieving-xml-node-from-a-path-specified-in-an-attribute-value-of-another-node/3017752#3017752

> [3]
>
http://stackoverflow.com/questions/3015942/retrieving-xml-node-from-a-path-specified-in-an-attribute-value-of-another-node/3485079#3485079

> [4] http://stamm-wilbrandt.de/en/xsl-list/xpath1/v0.3/
> [5]
>
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00203.html

> [6]
>
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/200212/msg00315.html

> [7] http://stamm-wilbrandt.de/en/xsl-list/xpath.dot.pdf
> [8] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html
> [9] http://www.graphviz.org/
> [10]
> $ cat dotnc.xsl
> <xsl:stylesheet version="1.0"
>    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
>>
>    <xsl:output method="text"/>
>
>    <xsl:template match="/">
> digraph G { size="7.5,10"; rankdir=TB; concentrate=true;
>
>      <xsl:for-each select="//xsl:template[@name]">
>        "<xsl:value-of select="@name"/>"[shape=box];
>        <xsl:if test=".//xsl:apply-templates">
>          "<xsl:value-of select="@name"/>" ->  "xsl:apply-templates";
>        </xsl:if>
>      </xsl:for-each>
>
>      <xsl:for-each select="//xsl:call-template">
>        "<xsl:value-of select="string(ancestor::xsl:template/@name)"/>"
>        ->  "<xsl:value-of select="@name"/>";
>      </xsl:for-each>
> }
>    </xsl:template>
>
> </xsl:stylesheet>
> $
> $ xsltproc dotnc.xsl xpath.xsl>xpath.dot
> $ dot -Tps xpath.dot>xpath.dot.ps
> $ ps2pdf xpath.dot.ps
> $
>
>
> Mit besten Gruessen / Best wishes,
>
> Hermann Stamm-Wilbrandt
> Developer, XML Compiler, L3
> WebSphere DataPower SOA Appliances
> ----------------------------------------------------------------------
> IBM Deutschland Research&  Development GmbH
> Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschaeftsfuehrung: Dirk Wittkopp
> Sitz der Gesellschaft: Boeblingen
> Registergericht: Amtsgericht Stuttgart, HRB 243294
>
>
>
> From:       Michael Kay<mike@xxxxxxxxxxxx>
> To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Date:       08/23/2010 12:36 AM
> Subject:    Re: [xsl] Question on duplicate node elimination
>
>
>
>
>
>> But how could the algorithm step of "duplicate elimination" be done?
>> How can the duplicates be determined and removed, correctly?
>>
>>
>>
> What makes you think it would be difficult?
>
> Of course, a processor needs some way to decide whether two nodes are
> identical/distinct. Given such a mechanism, it's not difficult to come
> up with algorithms that eliminate duplicate nodes.
>
> In practice, when XPath 1.0 is used as part of XSLT 1.0, the XPath
> requirement to eliminate duplicates can always be combined with the XSLT
> requirement to deliver the node-set sorted in document order. So the
> natural way to eliminate duplicates is as part of the sorting process.
>
> For performance, the most important technique is static analysis to
> identify those path expressions where the sort (and duplicate
> elimination) are unnecessary. For example, this is the case for the
> expression /a/b/c if it is evaluated either (a) using nested loops, or
> (b) by scanning the entire source document looking for nodes that match
> this pattern. For the expression //x//y, a sort is necessary if the
> evaluation uses nested loops, but not if it uses a whole-document scan
> and pattern matching. Remember that the evaluation techniques used
> internally may be very different from the descriptions you find in
> explanations of the semantics of the language.
>
> The way you have phrased the question suggests that you might be
> worrying about how exslt:node-set() affects the process. Simple answer -
> it doesn't.
>
> Michael Kay
> Saxonica


Current Thread
Keywords