[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] How to do this tricky elimination on XML using XSLT 2.0?


Subject: Re: [xsl] How to do this tricky elimination on XML using XSLT 2.0?
From: Jo Na <jkoe888@xxxxxxxxx>
Date: Tue, 19 Jun 2012 21:57:42 +0700

Dear Dr. Kay,
Thank your for your guide.
I modified the solution into:

 <xsl:variable name="removed-nodes" as="element(*)*">
        <xsl:for-each-group select="//blockA/*" group-by="concat(@id,
'~', @method, '~', otherchild)">
            <xsl:sequence select="subsequence(current-group(), 2)"/>
        </xsl:for-each-group>
    </xsl:variable>

    <xsl:template match="@* | node()">
        <xsl:if test="empty(. intersect $removed-nodes)">
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:apply-templates/>
            </xsl:copy>
        </xsl:if>
    </xsl:template>

It's almost correct just need to address two things:

1. Everytime a successive node with the `same id` has `different method`,
   the `boundary` for the next removal for that `id` is reset.
2. The removal cannot combine two different ancestor (<gridA id="1">
and <gridA id="2">)

**for example:**

    <elem id="1" method="a" />
    <elem id="1" method="a" /> <!-- this is repetitive for elem id=1
and will be removed -->
    <elem id="1" method="b" />
    <elem id="1" method="a" /> <!-- this is the new boundary for
removal elem id=1 and will not be removed -->
    <elem id="2" method="a" />
    <elem id="1" method="a" /> <!-- this is repetitive for elem id=1
and will be removed -->
    <elem id="2" method="a" /> <!-- this is repetitive for elem id=2
and will be removed -->

**will be simplified into:**

    <elem id="1" method="a" />
    <elem id="1" method="b" />
    <elem id="1" method="a" /> <!-- this is the new boundary for
removal elem id=1 and will not be removed -->
    <elem id="2" method="a" />


Please let me know how I can achieve such things. Thanks very much once
again.



On Tue, Jun 19, 2012 at 9:20 PM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> I think I would tackle this in two passes. First use xsl:for-each-group to
> identify the nodes to be removed; then do a modified identity transform
that
> retains only the nodes not in this list.
>
> The first pass is something like this:
>
> <!--  **Two node that have the same `name` and `id` will be considered
> *repetitive* if it appears one after another and it has the same `method`
> and `children`.** -->
> <xsl:variable name="removed-nodes" as="element(*)*">
> <xsl:for-each-group select="//blockA/*" group-by="concat(@id, '~', @method,
> '~', otherchild)">
> <xsl:sequence select="subsequence(current-group(), 2)"/>
> </xsl:for-each-group>
> </xsl:variable>
>
> The second pass is:
>
> <xsl:template match="*">
> <xsl:if test="empty(. intersect $removed-nodes)">
> <xsl:copy>
> <xsl:copy-of select="@*"/>
> <xsl:apply-templates/>
> </xsl:copy>
> </xsl:if>
> </xsl:template>
>
> Michael Kay
> Saxonica
>
>
> On 19/06/2012 10:14, Jo Na wrote:
>>
>> Hi,
>> I have this input xml:
>>     <map>
>>         <region>
>>             <gridA id="1">
>>                 <blockA id="01" method="build">
>>                     <building1 id="x" method="build">
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                     <building1 id="x" method="build">  <!-- this one
>> will be removed -->
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                 </blockA>
>>
>>                 <blockA id="01">
>>                     <building1 id="x" method="modify">
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                     <building1 id="x" method="build">  <!-- this one
>> will be kept (prev node have same id but diff method so it's not
>> considered as successive -->
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                 </blockA>
>>
>>                 <blockA id="02">
>>                     <building3 id="y" method="modify">
>>                         <otherchild>b</otherchild>
>>                     </building3>
>>                     <building2 id="x" method="demolish"/>
>>                 </blockA>
>>
>>                 <blockA id="01">
>>                     <building1 id="y" method="build">  <!-- this one
>> will be kept (diff id) -->
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                     <building1 id="x" method="build">  <!-- this one
>> will be removed -->
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                 </blockA>
>>
>>                 <blockA id="02">
>>                     <building3 id="y" method="modify">  <!-- this one
>> will be removed -->
>>                         <otherchild>b</otherchild>
>>                     </building3>
>>                     <building2 id="x" method="demolish"/>  <!-- this
>> one will be removed -->
>>                 </blockA>
>>             </gridA>
>>
>>             <gridA id="2">
>>                 <blockA id="01" method="build">
>>                     <building1 id="x" method="build">
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                     <building1 id="x" method="build">  <!-- this one
>> will be removed -->
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                     <building1 id="x" method="build">  <!-- this one
>> will be kept (diff children) -->
>>                         <otherchild>b</otherchild>
>>                     </building1>
>>                 </blockA>
>>                 <blockA id="01">
>>                     <building1 id="x" method="build">  <!-- this one
>> will be removed -->
>>                         <otherchild>b</otherchild>
>>                     </building1>
>>                 </blockA>
>>             </gridA>
>>             <gridB id="1">
>>                 ...and so on..
>>             </gridB>
>>         </region>
>>     </map>
>>
>> Expected Output:
>>
>>     <map>
>>         <region>
>>             <gridA id="1">
>>                 <blockA id="01" method="build">
>>                     <building1 id="x" method="build">
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                 </blockA>
>>
>>                 <blockA id="01">
>>                     <building1 id="x" method="modify">
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                     <building1 id="x" method="build">  <!-- this one
>> will be kept (prev node have same id but diff method so it's not
>> considered as successive -->
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                 </blockA>
>>
>>                 <blockA id="02">
>>                     <building3 id="y" method="modify">
>>                         <otherchild>b</otherchild>
>>                     </building3>
>>                     <building2 id="x" method="demolish"/>
>>                 </blockA>
>>
>>                 <blockA id="01">
>>                     <building1 id="y" method="build">  <!-- this one
>> will be kept (diff id) -->
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>                 </blockA>
>>
>>                 <blockA id="02"/>
>>             </gridA>
>>
>>             <gridA id="2">
>>                 <blockA id="01" method="build">
>>                     <building1 id="x" method="build">
>>                         <otherchild>a</otherchild>
>>                     </building1>
>>
>>                     <building1 id="x" method="build">  <!-- this one
>> will be kept (diff children) -->
>>                         <otherchild>b</otherchild>
>>                     </building1>
>>                 </blockA>
>>                 <blockA id="01"/>
>>             </gridA>
>>             <gridB id="1">
>>                 ...and so on..
>>             </gridB>
>>         </region>
>>     </map>
>> The XSLT so far:
>>
>>     <xsl:stylesheet version="2.0"
>> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>>         <xsl:output indent="yes"/>  <xsl:strip-space elements="*"/>
>>
>>         <xsl:template match="node()|@*">
>>             <xsl:copy>
>>                 <xsl:apply-templates select="node()|@*"/>
>>             </xsl:copy>
>>         </xsl:template>
>>
>>         <xsl:template match="region/*/*/*
>>              [deep-equal(.,preceding::*[name()=current()/name()]
>>                            [@id = current()/@id]
>>                            [../../@id = current()/../../@id][1])]" />
>>     </xsl:stylesheet>
>>
>> the problem with the XSLT right now is that it cannot differentiate
>> duplicates that happens in siblings (i.e blockA with the same id).
>>
>> I need to remove a node that are considered as *repetitive*.
>>
>> **Two node that have the same `name` and `id` will be considered
>> *repetitive* if it appears one after another and it has the same
>> `method` and `children`.**
>>
>> **for example:**
>>
>>     <elem id="1" method="a" />
>>     <elem id="1" method="a" />  <!-- this is repetitive for id=1-->
>>     <elem id="1" method="b" />
>>     <elem id="1" method="a" />  <!-- this is the new boundary for removal
>> id=1-->
>>     <elem id="2" method="a" />
>>     <elem id="1" method="a" />  <!-- this is repetitive for id=1 -->
>>     <elem id="2" method="a" />  <!-- this is repetitive for id=2 -->
>>
>> **will be simplified into:**
>>
>>     <elem id="1" method="a" />
>>     <elem id="1" method="b" />
>>     <elem id="1" method="a" />  <!-- this is the new boundary for removal
>> id=1-->
>>     <elem id="2" method="a" />
>>
>>  **- Everytime a successive node with the `same id` has `different
>> method`,
>>    the `boundary` for the next removal for that `id` is reset.**
>>
>>  - we need to take into account duplicates that are under one parent
>> or siblings (two or more parents nodes that has the same element name
>> and id) i.e (in example: `blockX`)
>>  - if the two nodes being compared did not share the same `gridX`
>> level, then they should not be considered as duplicates to be removed
>>
>> Please let me know how to achieve such transformation using XSLT 2.0.
>> Thanks very much for the help.


Current Thread
Keywords