[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] WordML Question and normalize-space() question


Subject: RE: [xsl] WordML Question and normalize-space() question
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 19 May 2006 11:21:18 +0100

> First my normalize-space() question... I'm wondering if it's 
> possible to normalize all the space without removing the 
> leading and trailing whitespace?

Slightly tricky one that, even under XSLT 2.0. I came up with

replace($in, '(\S)\s+(\S)', '$1 $2')

but I don't think this works if there are single-character words between the
spaces. Schemes for extracting the leading/trailing whitespace and adding it
back afterwards also have to cater for the possibility that the string is
all-white. 

Perhaps:

<xsl:analyze-string select="$in" regex="^(\s*)(.*?)(\s*)$">
  <xsl:matching-substring>
    <xsl:value-of select="regex-group(1)"/>
    <xsl:value-of select="normalize-space(regex-group(2))"/>
    <xsl:value-of select="regex-group(3)"/>
  </xsl:matching-substring>
</xsl:analyze-string>

Not tested.

Michael Kay
http://www.saxonica.com/

> 
> Next, I'm trying to perform conversion from an xml format 
> which contains XHTML to WordML and I'm having trouble 
> figuring out how to deal with formatting such as bold, 
> italics, etc. First though, I'm also just having specific 
> WordML problems, and I was wondering if anyone knew of a list 
> where I could ask WordML questions on. 
> 
> Anyways, obviously in XHTML these kind of things can be added 
> to be nested in each other, such as:
> 
> <b> bold text <i> bold and italic </i> just bold again </b>
>  
> however, in WordML, it's all a flat structure, so you need to 
> do something
> like:
>  
> <w:r><w:rPr><w:b/></w:rPr><w:r><w:t>bold text</w:t></w:r> 
> <w:r><w:rPr><w:b/><w:i/></w:rPr><w:r><w:t>bold and 
> italic</w:t></w:r> 
> <w:r><w:rPr><w:b/><w:i/></w:rPr><w:r><w:t>bold again</w:t></w:r>
> 
> 
> I currently deal with it by calling a template from each 
> formatting element (b, i, etc.) which checks all its 
> ancestors to see if there's any of the formatting elements 
> above it, and then applies it to the text() node, and then 
> calls apply-templates to everything below it, like follows:
> 
> 	<w:r>
> 		<w:rPr>
> 			<xsl:if
> test="ancestor-or-self::xhtml:strong|ancestor-or-self::xhtml:b
> |ancestor-or-s
> elf::b">
> 				<w:b/>
> 			</xsl:if>
> 			<xsl:if
> test="ancestor-or-self::xhtml:em|ancestor-or-self::i">
> 				<w:i/>
> 			</xsl:if>
> 			<xsl:if test="ancestor-or-self::xhtml:u">
> 				<w:u w:val="single"/>
> 			</xsl:if>
> 			<xsl:if
> test="ancestor-or-self::xhtml:a|ancestor-or-self::link|ancesto
> r-or-self::Lin
> k_Text">
> 				<w:color w:val="0000FF"/>
> 				<w:u w:val="single"/>
> 			</xsl:if>
> 		</w:rPr>
> 		<w:t><xsl:value-of select="text()"/></w:t>
> 	</w:r>
> 	
> 	<xsl:apply-templates/>
> 
> However, this only works if the text is immediately nested 
> inside the elements, like this:
> 
> <b><i>bold and italic text</b></i>
> 
> and not in cases like the earlier example, where I then end 
> up with something like the following:
> 
> <b> bold text just bold again</b><b><i>bold and italic</b></i>
> 
> And I'm not really sure how to fix it.
> 
> So, hopefully that made sense, and any help is appreciated,
> 
> Thanks,
> Jordan


Current Thread
Keywords