[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: RE: RE: [xsl] split OpenOffice 1.1 documents (flat xml)


Subject: RE: RE: RE: [xsl] split OpenOffice 1.1 documents (flat xml)
From: cknell@xxxxxxxxxx
Date: Mon, 04 Aug 2003 12:26:09 -0400

Since standard XSLT cannot produce more than one output document, I have taken the liberty of producing multiple "dokumnet" elements as child of the root element "uberdokument". This stylesheet will process all "kapitel" elements, producing a "dokument" element for each of them, and include any following-sibling element of "kapitel" ,up to the next instance of a "kapitel" element, in that "dokument" element.

Let me know if that is enough for you to carry on from here.
=============================================================
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" encoding="UTF-8" />
  <xsl:strip-space elements="*" />
  <xsl:template match="/">
    <xsl:apply-templates />
  </xsl:template>

  <xsl:template match="dokument">
    <uberdokument>
    <xsl:apply-templates />
    </uberdokument>
  </xsl:template>

  <xsl:template match="kapitel">
    <xsl:variable name="kap-num" select="count(preceding-sibling::*[name()='kapitel']) + 1" />
    <dokument>
     <kapitel><xsl:value-of select="." /></kapitel>
     <xsl:apply-templates select="following-sibling::node()[name() != 'kapitel']">
       <xsl:with-param name="num">
         <xsl:value-of select="$kap-num" />
       </xsl:with-param>
     </xsl:apply-templates>
    </dokument>
  </xsl:template>

  <xsl:template match="node()">
    <xsl:param name="num" />
    <xsl:variable name="parent-kap" select="count(preceding-sibling::*[name()='kapitel'])" />
    <xsl:if test="$num = $parent-kap and name()!='kapitel'">
      <xsl:copy-of select="." />
    </xsl:if>

  </xsl:template>
</xsl:stylesheet>

-- 
Charles Knell
cknell@xxxxxxxxxx - email



-----Original Message-----
From:     "Linnemann, Victor" <Linnemann@xxxxxxxxxxxxx>
Sent:     Thu, 31 Jul 2003 11:57:29 +0200
To:       cknell@xxxxxxxxxx
Subject:  RE: RE: [xsl] split OpenOffice 1.1 documents (flat xml)

Hello Charles,
sorry for mailing outside of XSL-List, but I had to unsubscribe from the
list (not the digest), because my admin wanted me to use a special e-mail
address for this purpose. So I will have to subscribe again.

>It appears that the beginning and end of a chapter is not signified by an
element, that is to say, there is no element that >contains a chapter. Is
that correct?

Yes.

>If so, how can you determine where a chapter begins and ends? If you can
answer that question, you have moved a long way
>toward solving the problem.

Indeed.

It appears that you can identify the beginning of a chapter with an XPath
expression along these lines:
"office:document-content/office:body/text:h[@text:level="1"]. It also seems
that all sibling nodes of a particular <text:h> element up to but not
including the next <text:h> sibling node are part of the chapter, is that
correct?

You're right.

I already have adapted a XSL from the FAQ that works well on very simple and
flat XML (see below).
My actual problem is to make this XSL work with the XML of OpenOffice
documents.
The differences to the existing XSL are:
a) use of name spaces (e.g. text:h)
b) between two chapter-elements any possible element (not only "abschnitt")
may occur

Can You give me a hint to get this XSL to work?
Thanks in advance,
Victor

*******************************************************************
XML (flat.xml)
*******************************************************************
<?xml version="1.0" encoding="UTF-8"?>
<dokument>
	<kapitel>kapitel 1</kapitel>
	<abschnitt>a</abschnitt>
	<abschnitt>b</abschnitt>
	<kapitel>kapitel 2</kapitel>
	<abschnitt>c</abschnitt>
	<abschnitt>d</abschnitt>
	<abschnitt>e</abschnitt>
	<kapitel>kapitel 3</kapitel>
	<abschnitt>f</abschnitt>
</dokument>
*******************************************************************
XSL (flat2hier.xsl)
*******************************************************************
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xalan="org.apache.xalan.xslt.extensions.Redirect"
extension-element-prefixes="xalan">

<xsl:output indent="yes" />

<xsl:template match="dokument">
     <xsl:apply-templates select="kapitel" />
</xsl:template>

<xsl:template match="kapitel">
      <xalan:write select="concat('kapitel-',position(),'.xml')">
         <dokument>
            <kapitel><xsl:value-of select="." /></kapitel>
            <xsl:apply-templates select="following-sibling::abschnitt
             [generate-id(preceding-sibling::kapitel[1])
             = generate-id(current())]" />
         </dokument>
     </xalan:write>
     <xalan:close select="concat('kapitel-',position(),'.xml')"/>
</xsl:template>

<xsl:template match="abschnitt">
     <xsl:copy-of select="." />
</xsl:template>

</xsl:stylesheet>
*******************************************************************
XML (resulting files: "kapitel-1.xml ... kapitel-3.xml")
*******************************************************************
<?xml version="1.0" encoding="UTF-8"?>
<dokument>
	<kapitel>kapitel 1</kapitel>
	<abschnitt>a</abschnitt>
	<abschnitt>b</abschnitt>
</dokument>
*******************************************************************
<?xml version="1.0" encoding="UTF-8"?>
<dokument>
	<kapitel>kapitel 2</kapitel>
	<abschnitt>c</abschnitt>
	<abschnitt>d</abschnitt>
	<abschnitt>e</abschnitt>
</dokument>
*******************************************************************
<?xml version="1.0" encoding="UTF-8"?>
<dokument>
	<kapitel>kapitel 3</kapitel>
	<abschnitt>f</abschnitt>
</dokument>
*******************************************************************
OpenOffice XML (content.xml)
*******************************************************************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE office:document-content PUBLIC "-//OpenOffice.org//DTD
OfficeDocument 1.0//EN" "office.dtd">
<office:document-content 
	xmlns:office="http://openoffice.org/2000/office" 
	xmlns:style="http://openoffice.org/2000/style" 
	xmlns:text="http://openoffice.org/2000/text" 
	xmlns:table="http://openoffice.org/2000/table" 
	xmlns:draw="http://openoffice.org/2000/drawing" 
	xmlns:fo="http://www.w3.org/1999/XSL/Format" 
	xmlns:xlink="http://www.w3.org/1999/xlink" 
	xmlns:number="http://openoffice.org/2000/datastyle" 
	xmlns:svg="http://www.w3.org/2000/svg" 
	xmlns:chart="http://openoffice.org/2000/chart" 
	xmlns:dr3d="http://openoffice.org/2000/dr3d" 
	xmlns:math="http://www.w3.org/1998/Math/MathML" 
	xmlns:form="http://openoffice.org/2000/form" 
	xmlns:script="http://openoffice.org/2000/script" office:class="text"
office:version="1.0">
	<office:script/>
	<office:font-decls>
		<style:font-decl style:name="MS Mincho"
fo:font-family="&apos;MS Mincho&apos;" style:font-pitch="variable"/>
		<style:font-decl style:name="Tahoma" fo:font-family="Tahoma"
style:font-pitch="variable"/>
		<style:font-decl style:name="Times New Roman"
fo:font-family="&apos;Times New Roman&apos;"
style:font-family-generic="roman" style:font-pitch="variable"/>
		<style:font-decl style:name="Arial" fo:font-family="Arial"
style:font-family-generic="swiss" style:font-pitch="variable"/>
	</office:font-decls>
	<office:automatic-styles/>
	<office:body>
		<text:sequence-decls>
			<text:sequence-decl text:display-outline-level="0"
text:name="Illustration"/>
			<text:sequence-decl text:display-outline-level="0"
text:name="Table"/>
			<text:sequence-decl text:display-outline-level="0"
text:name="Text"/>
			<text:sequence-decl text:display-outline-level="0"
text:name="Drawing"/>
		</text:sequence-decls>
		<text:h text:style-name="Heading 1" text:level="1">Kapitel
1</text:h>
		<text:p text:style-name="Standard">Dies ist mein
Dokument.</text:p>
		<text:h text:style-name="Heading 1" text:level="1">Kapitel
2</text:h>
		<text:p text:style-name="Standard">Vor jedem neuen Kapitel
soll gesplittet werden.</text:p>
	</office:body>
</office:document-content>



Victor Linnemann
--------------------------
Consulting / CMS

euroscript Switzerland AG

Hafenstrasse 50 d
CH-8280 Kreuzlingen

Tel:	(+41) 071 / 677 95 25
Fax:	(+41) 071 / 677 99 30
E-Mail:    Linnemann@xxxxxxxxxxxxx
Internet:  www.euroscript.ch





 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords