[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Split element with mixed content


Subject: RE: [xsl] Split element with mixed content
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 2 Aug 2007 17:13:34 +0100

> What is the best method for splitting an element that has 
> mixed content?
> 
> INPUT
> -----
> <root>
>     <p>The quick <b>brown</b> fox jumped over the lazy 
> dog.</p> </root>
> 
> In the above example I want to split on the word 'over'.
> 
> OUTPUT
> -----
> <root>
>     <p>The quick <b>brown</b> fox jumped </p>
>     <p>over the lazy dog.</p>
> </root>
> 

It's quite tricky, especially if the split might happen at an abitrary depth
(like splitting a table over two pages). For the general case (I did the
code once, and I'm not doing it again because I know it's tricky!) I think
the rule is:

0: for simplicity, assume an empty element <split/> at the split point.

1: generate the first element, by copying any node that precedes <split/> or
that is an ancestor of <split/>. (Easier in 2.0 with the << operator)

2. generate the second element, by copying any node that follows <split/> or
that is an ancestor of <split/>. (So the nodes that are ancestors of
<split/> are copied into both halves.)

"Copy" here is in the sense of doing an <xsl:copy> and a recursive
apply-templates. The copy operations can be done by a suitably modified
identity template that takes the <split/> element as a parameter.

If you know the split is at the top level you can simplify this by cutting
out the recursion.

Without an explicit marker at the split point, it becomes a little more
difficult - I'm not sure how you want to identify where the split is to
occur.

If there are multiple split points, do the above to split into two, then
apply the process recursively to the second half.

Michael Kay
http://www.saxonica.com/


Current Thread