[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] word search a document and eliminate duplicate results


Subject: [xsl] word search a document and eliminate duplicate results
From: "Susan" <laborde@xxxxxxxxxxxx>
Date: Sat, 8 Jan 2005 16:02:16 -0600

Hello,
Maybe I'm missing the obvious, but I've Googled forever and am still lost. I want to search a document for every occurrence of a particular word and output each paragraph that includes the search term. My code works fine except that if the search term appears in one paragraph 3 times, then I get that paragraph three times in the output. I do not want to repeat same paragraph.
I'm not sure what I should be looking for. I know a little about how to de-duplicate using keys, but I don't see how that can work in this case. And I know almost nothing about recursion. Is that what is required here? Any pointers would be greatly appreciated.
Thanks,
Susan


The xml looks like this (I do hope this makes sense):

<p>
...may contain text only or a mixture of text and other elements...
</p>
<p>...may contain...
<note>a note with text only</note>
</p>
<p>...may also contain...
<note>no paragraph tags here, just text<p>2nd paragraph of note</p><p>3rd paragraph of note</p></note>
...more text...
</p>


The search term ($target1) may appear within a <p>, within a <note>, or within a <p> that is within a <note>. If it occurs anywhere within a note, then I need not only the note but the paragraph in which it resides. Which further muddies the water because if a paragraph contains the search term and that same paragraph also contains a <note> that contains the search term, I still need to output that paragraph only once. My stylesheet (stripped of all the formatting stuff) looks like:

*****
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">


<xsl:variable name="target1">
 <item>find-this-word</item>
</xsl:variable>

<xsl:variable name="select-elements">
<item>p</item>
</xsl:variable>

<xsl:template match="/">
<xsl:apply-templates select="/" mode="search">
<xsl:with-param name="term" select="$target1"/>
<xsl:with-param name="element" select="document('')/*/xsl:variable[@name='select-elements']/item"/>
</xsl:apply-templates>
</xsl:template>


<xsl:template match="/" mode="search">
   <xsl:param name="term" select="''"/>
   <xsl:param name="element" select="''"/>
       <div>
           <xsl:apply-templates select="@*|node()" mode="search">
               <xsl:with-param name="term" select="$term"/>
               <xsl:with-param name="element" select="$element"/>
                   </xsl:apply-templates>
       </div>
</xsl:template>

<xsl:template match="@*|node()" mode="search">
   <xsl:param name="term" select="''"/>
   <xsl:param name="element" select="''"/>
       <xsl:copy>
           <xsl:apply-templates select="@*|node()" mode="search">
               <xsl:with-param name="term" select="$term"/>
               <xsl:with-param name="element" select="$element"/>
           </xsl:apply-templates>
       </xsl:copy>
</xsl:template>

<xsl:template match="text()" mode="search">
<xsl:param name="term" select="''"/>
<xsl:param name="element" select="''"/>
<xsl:choose>
<xsl:when test="contains(.,$term)">
<xsl:for-each select="ancestor::p[not(parent::note)][1]">
<xsl:apply-templates/>
<hr/><br/>
</xsl:for-each>
</xsl:when>
<xsl:otherwise/>
</xsl:choose>
</xsl:template>
*****



Current Thread
Keywords
xml