[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Re: [xsl] grouping duplicate links with xslt 1.0
Subject: Re: [xsl] grouping duplicate links with xslt 1.0 From: Brandon Ibach <brandon.ibach@xxxxxxxxxxxxxxxxxxx> Date: Fri, 21 Oct 2011 11:30:30 -0400 |
I'd approach this as an identity transform with special handling for internalLink elements. That way, you're less likely to lose other content, since the default is to copy everything unless you say otherwise. The other issue I was concerned about with your current approach is that it tries to combine every internalLink with the same target, no matter where in the document they occur, whereas it seems your requirement is more like "combine any series of internalLink elements all sharing the same target, separated only by whitespace". That's the approach I went with, below. Let me know if I misinterpreted your goal. Also, I made an assumption that the real content of each internalLink is anything after the target child element, since it appeared that anything before that is just whitespace for markup formatting purposes. If actual content may occur before the target, then the mode="include" template will need to be adjusted accordingly. <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <!-- duplicate link processor --> <xsl:output method="xml"/> <!-- identity template to copy most content to result as-is --> <xsl:template match="@* | node()"> <xsl:copy><xsl:apply-templates select="@* | node()"/></xsl:copy> </xsl:template> <!-- handle specially internalLinks that are not preceded by another internalLink with the same target (possibly separated by text nodes consisting only of whitespace) --> <xsl:template match="internalLink[not(target = preceding-sibling::node()[not(self::text()[normalize-space(.) = ''])][1]/self::internalLink/target)]"> <xsl:variable name="target" select="target"/> <!-- sibs = number of following siblings that are not (a) whitespace-only text nodes or (b) internalLink elements with the same target --> <xsl:variable name="sibs" select="count(following-sibling::node()[not(self::text()[normalize-space(.) = ''] | self::internalLink[target = $target])])"/> <xsl:copy> <xsl:apply-templates select="@* | node()"/> <xsl:apply-templates select="following-sibling::*[count(following-sibling::node()[not(self::text() [normalize-space(.) = ''] | self::internalLink[target = $target])]) = $sibs]" mode="include"/> </xsl:copy> </xsl:template> <!-- for an internalLink included in another, output just the content following the "target" child element --> <xsl:template match="internalLink" mode="include"> <xsl:apply-templates select="target/following-sibling::node()"/> </xsl:template> <!-- suppress all other internalLink elements and whitespace between two internalLink elements sharing the same target --> <xsl:template match="internalLink"/> <xsl:template match="text()[normalize-space(.) = ''][following-sibling::node()[1]/self::internalLink/target = preceding-sibling::node()[1]/self::internalLink/target]"/> </xsl:stylesheet> -Brandon :) On Fri, Oct 21, 2011 at 10:11 AM, Terry Ofner <tdofner@xxxxxxxxx> wrote: > I couldn't find any reference to this issue in the archive. If it has been addressed before, please forgive. > > I have an issue with MS Word outputting duplicate links in xml, breaking up the text. I need to group identical links and output one link while leaving all other nodes/text the same. Here is an example of the input xml: > > <paragraphs> > > <!-- Have students practice [the activity in 9.01].--> > <p>Have students practice > <internalLink> > <target>Update_Link [7] [act_1]</target>the activity</internalLink> > <internalLink> > <target>Update_Link [7] [act_1]</target> in 9.01</internalLink>.</p> > > <!-- Have students practive [the activity in 9.02]. --> > > <p>Have students practice <internalLink> > <target>Update_Link [7] [act_2]</target>the activity</internalLink> > <internalLink> > <target>Update_Link [7] [act_2]</target> in 9.02</internalLink>.</p> > </paragraphs> > > I am limited to xslt 1.0. The following 1.0 sheet does everything I need it to except it drops the final period. > > <?xml version="1.0" encoding="UTF-8"?> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> > > <!-- duplicate link processor --> > > <xsl:output method="xml" indent="yes"/> > > <xsl:key name="link_target" match="internalLink" use="target" /> > > > <xsl:template match="paragraphs"> > <xsl:for-each select="p"> > <p><xsl:apply-templates select="./text()[1]"/><internalLink> > <target><xsl:apply-templates select="internalLink[count(. | key('link_target', target)[1]) = 1]/target"/></target> > <xsl:apply-templates select="./internalLink/text()"/></internalLink></p></xsl:for-each> > </xsl:template> > > </xsl:stylesheet> > > Here is the output using Oxygen/Saxon 6.5.5. Everything is good except for the final period. > > <?xml version="1.0" encoding="utf-8"?> > <p>Have students practice > <internalLink> > <target>Update_Link [7] [act_1]</target> > the activity > in 9.01</internalLink> > </p> > <p>Have students practice <internalLink> > <target>Update_Link [7] [act_2]</target> > the activity > in 9.02</internalLink> > </p> > > Any pointers would be most appreciated. > > Terry
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] grouping duplicate links with, Terry Ofner | Thread | [xsl] Setting a boolean value, Mark |
[xsl] grouping duplicate links with, Terry Ofner | Date | RE: [xsl] XSLT vs Schematron Decisi, Norm Birkett |
Month |