[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: Passing element nodes through string functions (WAS RE: [xsl] Preserving inline elements when using string functions)


Subject: Re: Passing element nodes through string functions (WAS RE: [xsl] Preserving inline elements when using string functions)
From: Bill Keese <billk@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 28 Aug 2003 14:27:33 +0900

This can be considered a grouping problem (as David Carlisle wrote),
where each row corresponds to a group (and all text nodes and <a> nodes
within a row belong to the same group), but it's tricky because a single
text node with newlines splits across two (or more!) groups.

Thus my approach to the problem is a two-phase approach:
1) split text nodes with newlines into multiple text nodes
2) do the grouping (splitting groups where the line breaks are)

For example, consider the input file:

This is <a href="foo">hello</a> the first line
and this <a href="foo">hello</a> is the second line.

The nodes are:
1. This is
2. <a>
3. the first line \n and this
4. <a>
5. is the second line.

#3 is the tricky one. The first transform should convert the above
node-set into the following node-set:

1. this is
2. <a ...>
3. the first line
4. <line-break/>
5. and this
6. <a ..>
7. is the second line.

Then, you just group by the <line-break/> node. In XSL version 2 you do:

<xsl:for-each-group select="bodytext/node()"
group-ending-with="line-break">
<div>
<xsl:apply-templates select="current-group()[not(self::line-break)]"/>
</div>
</xsl:for-each-group>

For those of us forced to use MSXML (which I assume doesn't support XSL
version 2), in place of the for-each-group you would have to do the
Muenchian grouping as described on jenny's site
(http://www.jenitennison.com/xslt/grouping/muenchian.html) and in
David's mail. This is pretty confusing stuff, but the basic idea is:

1) the nodes which "start" each group (a.k.a. row) have an id number
(generated by generate-id). <bodytext> starts the first row, and
<line-break/> starts each subsequent row.

2) Every node that belongs to a group gets the same "key". Specifically,
the key of every node within a certain group is equal to the id of the
node that starts that group

So the keys would look like this:

<bodytext> (**id=1)
   This is (key=1)
   <a> (key=1)
   the first line (key=1)
   <line-break/> (**id=2)
   and this (key=2)
   <a> (key=2)
   is the second line. (key=2)

The tricky XSL to generate these keys is something like this:

<xsl:key name="x" match="bodytext/node()"
use="generate-id((..|preceding-sibling::line-break)[last()])"/>

In other words, make a list like this: (my-parent-node, line-break-nodes-before-me), and then take the last element in that list.  This gives the previous line-break node,  or the bodytext node if there is no previous line-break node.

-----------------------------------------------------------------------

Here is the code to split text with newlines into multiple nodes. It's
recursive in case a single text node has two (or more) embedded
newlines. I ended up encoding each text segment within <myText> tags,
because if you use an intermediate file to save the results this helps
keep track of the divisions between text nodes.

<xsl:template match="text()">
<xsl:call-template name="split-text">
<xsl:with-param name="arg1">
<xsl:value-of select="."/>
</xsl:with-param>
</xsl:call-template>
</xsl:template>

<xsl:template name="split-text">
<xsl:param name="arg1"/>
<xsl:choose>
<xsl:when test="contains($arg1,'&#10;')">
<myText><xsl:value-of select="substring-before($arg1,'&#10;')"/></myText>
<xsl:call-template name="split-text">
<xsl:with-param name="arg1">
<xsl:value-of select="substring-after($arg1,'&#10;')"/>
</xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<myText><xsl:value-of select="$arg1"/></myText>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

------------------------------------------------

Bill


David Carlisle wrote:

>I wrote:
>
>  
>
>>  it's a grouping problem (positional grouping in Mike's terminoligy)
>>  you want to group all child nodes before or after text nodes containing
>>  cr ie text()[contains(.,'&#10;')] you need to work at the level of nodes
>>  not of the entire content of your bodytext element.
>>
>>  See Jeni's site on grouping techniques.
>>
>>  David
>>    
>>
>
>
>I suppose this is probably more helpful...
>
>div.xml
>========
>
><page>
>    <bodytext>This is the <link url="zzz">link</link>
>    This is another line</bodytext>
></page>
>
>
>
>div.xsl
>========
>
><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                 version="1.0">
>
><xsl:key name="x" match="bodytext/node()"
>use="generate-id((..|preceding-sibling::text()[contains(.,'&#10;')][1])[last()])"/>
>
><xsl:template match="page">
><html>
><head>
><title>testing...</title>
></head>
><xsl:apply-templates/>
></html>
></xsl:template>
>
><xsl:template match="link">
><a href="{@url}">
><xsl:apply-templates/>
></a>
></xsl:template>
>
>
><xsl:template match="bodytext">
><body>
><xsl:for-each select=".|text()[contains(.,'&#10;')]">
><div>
><xsl:value-of 
>  select="substring-after(self::text(),'&#10;')"/>
><xsl:apply-templates
>select="key('x',generate-id(.))[position()&lt;last()]"/>
></div>
><xsl:value-of
>select="substring-before(key('x',generate-id(.))[last()],'&#10;')"/>
></xsl:for-each>
></body>
></xsl:template>
>
></xsl:stylesheet>
>
>
>
>
>
>
>$ saxon div.xml div.xsl
>
>
><html>
>   <head>
>      <meta http-equiv="Content-Type" content="text/html;
>charset=utf-8">
>
>      <title>testing...</title>
>   </head>
>
>   <body>
>      <div>This is the <a href="zzz">link</a></div>
>      <div>    This is another line</div>
>   </body>
>
></html>
>
>
> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>
>
>  
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords
xsl