[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] normalize-space() except ...

Subject: Re: [xsl] normalize-space() except ...
From: "Flynn, Peter pflynn@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 10 Mar 2015 21:35:45 -0000

On 02/10/2015 12:24 AM, Liam R E Quin liam@xxxxxx wrote:
> On Mon, 9 Feb 2015 20:21:50 -0000
> "dvint@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> <dd><p>
>>          This is my text
>> with <i>italics content</i> with other text.
>> </p></dd>
>> My output is coming out like this:
>> <ss:Data>This is my text with<ss:font italics="yes">italics
>> content</ss:font>.</ss:Data>
> I'd probably do this in two steps -
> (1) match text() and turn one or more whitespace characters into a space,
>     probably using replace()
> (2) strip leading space from the first text() in p, and trailing space from
the last;

I do almost exactly this in several applications. I think it's fairly

>     watch for
>     <p>The man wore<i> black </i>socks</p>
>     which is not unlikely in XML made from word processing software.

Slightly more common would be <p>The man wore <i>black </i>socks</p>
where a double-click highlight in the WP software included the trailing
space on the word (someone just told me Word has just stopped doing
this: can anyone confirm?).

More pernicious is the erroneous elision of white-space-only nodes in
mixed content:

<p>The man wore <b>black socks<b> <i>only</i> on Tuesdays.</p>

resulting in The man wore black socksonly on Tuesdays. due to a faulty
xsl:strip-space (white-space-only nodes between subelements in mixed
content should probably never be removed, which is sometimes hard to
explain to people unaccustomed to document-class XML).


Current Thread