[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] grouping duplicate links with xslt 1.0


Subject: Re: [xsl] grouping duplicate links with xslt 1.0
From: Brandon Ibach <brandon.ibach@xxxxxxxxxxxxxxxxxxx>
Date: Fri, 21 Oct 2011 11:30:30 -0400

I'd approach this as an identity transform with special handling for
internalLink elements.  That way, you're less likely to lose other
content, since the default is to copy everything unless you say
otherwise.

The other issue I was concerned about with your current approach is
that it tries to combine every internalLink with the same target, no
matter where in the document they occur, whereas it seems your
requirement is more like "combine any series of internalLink elements
all sharing the same target, separated only by whitespace".  That's
the approach I went with, below.  Let me know if I misinterpreted your
goal.

Also, I made an assumption that the real content of each internalLink
is anything after the target child element, since it appeared that
anything before that is just whitespace for markup formatting
purposes.  If actual content may occur before the target, then the
mode="include" template will need to be adjusted accordingly.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<!-- duplicate link processor -->

<xsl:output method="xml"/>

<!-- identity template to copy most content to result as-is -->
<xsl:template match="@* | node()">
    <xsl:copy><xsl:apply-templates select="@* | node()"/></xsl:copy>
</xsl:template>

<!-- handle specially internalLinks that are not preceded by another
     internalLink with the same target (possibly separated by text nodes
     consisting only of whitespace) -->
<xsl:template match="internalLink[not(target =
preceding-sibling::node()[not(self::text()[normalize-space(.) =
''])][1]/self::internalLink/target)]">
    <xsl:variable name="target" select="target"/>
    <!-- sibs = number of following siblings that are not (a) whitespace-only
         text nodes or (b) internalLink elements with the same target -->
    <xsl:variable name="sibs"
select="count(following-sibling::node()[not(self::text()[normalize-space(.)
= ''] | self::internalLink[target = $target])])"/>
    <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
        <xsl:apply-templates
select="following-sibling::*[count(following-sibling::node()[not(self::text()
[normalize-space(.)
= ''] | self::internalLink[target = $target])]) = $sibs]"
mode="include"/>
    </xsl:copy>
</xsl:template>

<!-- for an internalLink included in another, output just the content
     following the "target" child element -->
<xsl:template match="internalLink" mode="include">
    <xsl:apply-templates select="target/following-sibling::node()"/>
</xsl:template>

<!-- suppress all other internalLink elements and whitespace between two
     internalLink elements sharing the same target -->
<xsl:template match="internalLink"/>
<xsl:template match="text()[normalize-space(.) =
''][following-sibling::node()[1]/self::internalLink/target =
preceding-sibling::node()[1]/self::internalLink/target]"/>

</xsl:stylesheet>

-Brandon :)


On Fri, Oct 21, 2011 at 10:11 AM, Terry Ofner <tdofner@xxxxxxxxx> wrote:
> I couldn't find any reference to this issue in the archive. If it has been
addressed before, please forgive.
>
> I have an issue with MS Word outputting duplicate links in xml, breaking up
the text. I need to group identical links and output one link while leaving
all other nodes/text the same. Here is an example of the input xml:
>
> <paragraphs>
>
>    <!-- Have students practice [the activity in 9.01].-->
> <p>Have students practice
>    <internalLink>
>    <target>Update_Link [7] [act_1]</target>the activity</internalLink>
>    <internalLink>
>    <target>Update_Link [7] [act_1]</target> in 9.01</internalLink>.</p>
>
>    <!-- Have students practive [the activity in 9.02]. -->
>
> <p>Have students practice <internalLink>
>    <target>Update_Link [7] [act_2]</target>the activity</internalLink>
> <internalLink>
>    <target>Update_Link [7] [act_2]</target> in 9.02</internalLink>.</p>
> </paragraphs>
>
> I am limited to xslt 1.0. The following 1.0 sheet does everything I need it
to except it drops the final period.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
>
>    <!-- duplicate link processor -->
>
>    <xsl:output method="xml" indent="yes"/>
>
> <xsl:key name="link_target" match="internalLink" use="target" />
>
>
> <xsl:template match="paragraphs">
>    <xsl:for-each select="p">
>    <p><xsl:apply-templates select="./text()[1]"/><internalLink>
>        <target><xsl:apply-templates select="internalLink[count(. |
key('link_target', target)[1]) = 1]/target"/></target>
>        <xsl:apply-templates
select="./internalLink/text()"/></internalLink></p></xsl:for-each>
> </xsl:template>
>
> </xsl:stylesheet>
>
> Here is the output using Oxygen/Saxon 6.5.5. Everything is good except for
the final period.
>
> <?xml version="1.0" encoding="utf-8"?>
> <p>Have students practice
>    <internalLink>
>      <target>Update_Link [7] [act_1]</target>
>    the activity
>     in 9.01</internalLink>
> </p>
> <p>Have students practice <internalLink>
>      <target>Update_Link [7] [act_2]</target>
>    the activity
>     in 9.02</internalLink>
> </p>
>
> Any pointers would be most appreciated.
>
> Terry


Current Thread
Keywords