How to remove unwanted whitespace from an XML element?

Here should go questions about transforming XML with XSLT and FOP.
mlcook
Posts: 67
Joined: Thu Jan 24, 2008 4:52 pm

How to remove unwanted whitespace from an XML element?

Post by mlcook »

I'm using Oxygen 9.1 with Saxon 9B.

I'm creating a Microsoft Word document from the contents of an XML file, using XSLT to process the XML elements and output WordprocessingML tags.

Most of the transformation is going well, but I'm having trouble with mixed text in an element. An example XML element to transform is this one:
<Description>This is some text for the paragraph. It has some <bold>formatting
tags</bold> in it which I will <bold><italic>process myself</italic> as I
output</bold> the text of this message. This paragraph has more text
here.</Description>
I invoke my template with:

Code: Select all

        <xsl:apply-templates select="Description"/>
When my Description template has:

Code: Select all


        <w:r>
<w:t>
<xsl:apply-templates/>
</w:t>
</w:r>
to process the text, the whitespace before each line of the <Description> gets copied to the output and Word treats it as fixed whitespace, so I get big gaps in the text.

When the Description template has:

Code: Select all


        <w:r>
<w:t>
<xsl:value-of select="normalize-space()"/>
</w:t>
</w:r>
then the whitespace is removed, but the formatting tags are also removed so that I don't have a chance to process them.

The XML file with the <Description> elements (and others, of course) is hand-edited and formatted. The Oxygen "Format and Indent" operation formats the <Description> element as above, so I don't really have control over the line breaks in the text.

Question: How do I get rid of the extra whitespace while processing the <Description> tag, and yet be able to process any formatting commands in it?

Thanks, Mike Cook
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Post by sorin_ristache »

Hello,

You have to add a copy template which preserves the element nodes in the output instead of ignoring them. Otherwise the built-in XSLT templates take effect which for element nodes do nothing that is ignore them. That means you have to add the following template to your stylesheet:

Code: Select all


    <xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*"/>
</xsl:copy>
</xsl:template>
You should study how built-in templates work in XSLT.


Regards,
Sorin
mlcook
Posts: 67
Joined: Thu Jan 24, 2008 4:52 pm

RE: How to remove unwanted whitespace from an XML element?

Post by mlcook »

Sorin,

Thanks for the link about built-in templates.

I still have my original problem even after adding the copy template you gave.
This is some text for the paragraph. It has some formatting____________tags in it which I will process myself as I____________output the text of this message. This paragraph has more text____________here.
(Please note that the underscores above represent spaces. Extra spaces would have been removed when posted to the forum.)

I still get the large spacing on output when using <apply-templates>, or no formatting when using normalize-space.

Thanks again,
Mike
jkmyoung
Posts: 89
Joined: Mon Mar 06, 2006 10:13 pm

Post by jkmyoung »

why not add a text handling template?

<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
mlcook
Posts: 67
Joined: Thu Jan 24, 2008 4:52 pm

RE: How to remove unwanted whitespace from an XML element?

Post by mlcook »

Dear jkmyoung,

Voilà!

Why not use a text handling template, indeed! As you may guess, I'm new to XSL. I've read about text() nodes, but not until now was I able to see more.

Your text() template did just what I needed! I knew there must be a simple way.

Thanks, Mike
Post Reply