[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Hello Nicolas,
you have really bad structured XML. Where should the processor know from, where a sentence ends and a new one starts? Can you always use '.' as marker?
I tried with a key-based solution (all nodes will be collected by the id of the text-node with the next '.'):
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="sentences" match="node()" use="generate-id(following-sibling::text()[contains(., '.')][1])"/>
<xsl:template match="root">
<html>
<ol>
<xsl:apply-templates select="text()[contains(., '.')]" mode="end-of-sentence"/>
</ol>
</html>
</xsl:template>
<xsl:template match="text()" mode="end-of-sentence">
<li>
<xsl:apply-templates select="key('sentences', generate-id(.))" mode="rest-of-sentence"/>
<xsl:value-of select="substring-before(., '.')"/>
<xsl:text>.</xsl:text>
</li>
</xsl:template>
</xsl:stylesheet>
The output with Xalan:
I don't know whether the solution is perfect. It's a bit difficult to see any errors. But I would start with changing the terrible XML.
Regards,
Joerg
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Re: [xsl] (text processing) lexical context
Subject: Re: [xsl] (text processing) lexical context From: Joerg Heinicke <joerg.heinicke@xxxxxx> Date: Wed, 24 Apr 2002 09:13:57 +0200 |
Hello Nicolas,
<root> This is the <w>first</w> <i>sentence</i>. This is the <w>second</w> <i>sentence</i>. This is the <w>third</w> <i>sentence</i>. </root>
you have really bad structured XML. Where should the processor know from, where a sentence ends and a new one starts? Can you always use '.' as marker?
I tried with a key-based solution (all nodes will be collected by the id of the text-node with the next '.'):
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="sentences" match="node()" use="generate-id(following-sibling::text()[contains(., '.')][1])"/>
<xsl:template match="root">
<html>
<ol>
<xsl:apply-templates select="text()[contains(., '.')]" mode="end-of-sentence"/>
</ol>
</html>
</xsl:template>
<xsl:template match="text()" mode="end-of-sentence">
<li>
<xsl:apply-templates select="key('sentences', generate-id(.))" mode="rest-of-sentence"/>
<xsl:value-of select="substring-before(., '.')"/>
<xsl:text>.</xsl:text>
</li>
</xsl:template>
<xsl:template match="node()" mode="rest-of-sentence"> <xsl:copy-of select="."/> </xsl:template>
<xsl:template match="text()[contains(., '.')]" mode="rest-of-sentence"> <xsl:value-of select="substring-after(., '.')"/> </xsl:template>
</xsl:stylesheet>
The output with Xalan:
<html> <ol> <li> This is the <w>first</w> <i>sentence</i> without a comma.</li> <li> This is the <w>second</w>
<i>sentence</i>.</li> <li> This is the <w>third</w> <i>sentence</i>.</li> </ol> </html>
I don't know whether the solution is perfect. It's a bit difficult to see any errors. But I would start with changing the terrible XML.
Regards,
Joerg
would be formatted so that the list would look like:
<html>
<ol>
<li>first: This is the <b>first</b> <i>sentence</i>. <li>Second: This is the <w>second</b> <i>sentence</i>. <li>Third: This is the <b>third</b> <i>sentence</i>.
</ol>
</html>
But I can't figure out how I can select the text surrounding the <w> element without using <xsl:value-of.../>, which does not allow me to process the following <i> element...
i.e., I get
<html>
<ol>
<li>first: This is the <b>first</b> sentence. <li>Second: This is the <w>second</b> sentence. <li>Third: This is the <b>third</b> sentence.
</ol>
</html>
and the <i> element is lost...
And I can't do <xsl template match="substring(...)"> because substring is not a DOM node.
Help: is there a way to process substrings or stg?
N. Mazziotta
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] (text processing) lexical con, Nicolas Mazziotta | Thread | RE: [xsl] (text processing) lexical, Michael Kay |
[xsl] (text processing) lexical con, Nicolas Mazziotta | Date | RE: [xsl] why is it worked out?, Jarno . Elovirta |
Month |