[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] breaking up XML on page break element


Subject: Re: [xsl] breaking up XML on page break element
From: "Geert Bormans geert@xxxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 4 Jul 2014 16:30:57 -0000

Thanks Gerrit,
(I admit I need to read this twice to get it, but
that might be caused by the 0-1 and me not trying
to miss all of the fun in Rio)
I will look into it after the match


At 17:18 4/07/2014, you wrote:
I tackle it by what I call bupward projectionb:

When processing the top-level element, do a
for-each-group of all descendants that are
terminal nodes (those without children), with a
group-starting-with at the splitting points.

For each group, process the book (or the HTML
body, or whatever common ancestor there is) once
in another mode, with a tunneled parameter
'restricted-to' that contains, for each group,
the terminal nodes and their ancestors.

When processing each group, for each node that
you encounter, test whether the node is
contained in the tunneled variable (using
intersect). If it is, reproduce the node and
continue in this mode, if it isnbt contained, do nothing.

There may be an option to discard or to reproduce the splitting elements.

Examples for this technique are in
https://subversion.le-tex.de/common/evolve-hub/evolve-hub.xsl,
modes hub:split-at-tab and hub:split-at-br

They are a bit more complex than your case
because they split paragraphs that may contain
tables or footnotes that in turn can contain
other paragraphs. I introduced the function
hub:same-scope($splitting-element,
$containing-element) to split only at splitting
elements that are contained within the paragraph
that should be split, rather than in a paragraph
that is contained in a footnote or table cell
that is somehow contained in the given paragraph.

I might prepare a synthetic standalone example
if anyone is interested, and furthermore on the
condition that interested parties root for Germany instead of France today.

Gerrit

On 04.07.2014 16:43, Geert Bormans geert@xxxxxxxxxxxxxxxxxxx wrote:
Hi all,

Here is a fun one I thought I could share

I have a nicely nested XML (a bit TEI like)
and markers for page breaks can happen everywhere in the document (as
empty elements)

Now I want to break the document per page, reconstructing the structure
So in a first step, I want to isolate the pagebreak to the highest level

<book>
<title>...</title>
<section>
<para>aaa<pb/>bbb</para>
</section>
</book>

to become

<book>
<title>...</title>
<section>
<para>aaa</para>
</section>
<pb/>
<section>
<para>bbb</para>
</section>
</book>

Bearing in mind I need a generic solution
and pagebreaks can happen at every level

Any thoughts?
I am not looking for code, just curious on how people would attack this

Thanks

Geert

-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard VC6ckler


Current Thread
Keywords