[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Turning escaped mixed content back to XML


Subject: Re: [xsl] Turning escaped mixed content back to XML
From: Martin Holmes <mholmes@xxxxxxx>
Date: Fri, 28 Mar 2014 12:09:24 -0700

That's what I needed: parse-xml-fragment(). This seems to work:

<xsl:template match="text:p" exclude-result-prefixes="#all">

<!-- <xsl:variable name="unparsed" select="concat('&lt;p&gt;', string-join(//text(), ''), '&lt;/p&gt;')"/>
<xsl:variable name="parsed" select="saxon:parse($unparsed)"/>
<xsl:copy-of select="$parsed" exclude-result-prefixes="#all"/>-->
<xsl:if test="string-length(.) gt 0">
<tei:p>
<xsl:value-of select="parse-xml-fragment(string-join(//text(), ''))"/>
</tei:p></xsl:if>
</xsl:template>


for most cases. I do have some horrible edge-cases though:

<text:p>a start-tag, with delimiters &lt; and &gt; is intended</text:p>

I should be able to pre-process the input text for angle brackets in the context of spaces and swap them out for something else temporarily though.

Thanks,
Martin

On 14-03-28 11:35 AM, Martin Honnen wrote:
Martin Holmes wrote:

I'm trying to process an ODS spreadsheet which has <text:p> nodes which
contain embedded mixed-content markup in escaped form:

<text:p>indicates the amount by which this zone has been rotated
clockwise, with respect to the normal orientation of the parent
&lt;gi&gt;surface&lt;/gi&gt; element as implied by the dimensions given
in the &lt;gi&gt;msDesc&lt;/gi&gt; element or by the coordinates of the
&lt;gi&gt;surface&lt;/gi&gt; itself. The orientation is expressed in arc
degrees.</text:p>

I need to turn this back into parsed XML for insertion into XML
documents. I'm using Saxon 9.4 with XSLT 2 (and I can use 3 if
necessary).

I tried


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:text="http://example.com"
   xmlns:tei="http://example.com/tei"
   version="3.0">

<xsl:template match="text:p">
   <tei:p>
     <xsl:copy-of select="parse-xml-fragment(.)"/>
   </tei:p>
</xsl:template>

</xsl:stylesheet>

with Saxon 9.5 PE and got


<?xml version="1.0" encoding="UTF-8"?><tei:p xmlns:text="http://example.com" xmlns:tei="http://example.com/tei">indicate s the amount by which this zone has been rotated clockwise, with respect to the normal orientation of the parent <gi>sur face</gi> element as implied by the dimensions given in the <gi>msDesc</gi> element or by the coordinates of the <gi>sur face</gi> itself. The orientation is expressed in arc degrees.</tei:p>

That has XML elements and not escaped markup so should do, you will need
to change the namespaces and maybe use exclude-result-prefixes.


Current Thread
Keywords