[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Move elements to preceding parent


Subject: Re: [xsl] Move elements to preceding parent
From: Israel Viente <israel.viente@xxxxxxxxx>
Date: Mon, 15 Jun 2009 15:59:37 +0300

Hi Martin,
Thank you for this. It looks very elegant.
Can you please explain the idea of the line:
>  <xsl:template match="p[preceding-sibling::p[1][span[@class ne 'chapter']
> and not(matches(span[@class ne 'chapter'][last()], '[.?&quot;!]$'))]]"/>

Does it remove the p  that has preceding sibling with no ending
character at the end of the last span?


I tried it with a more complete example like the following:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
   <meta http-equiv="Content-Type" content="application/xhtml+xml;
charset=utf-8"/>
   <title/>
   <link href="test.css" rel="stylesheet" type="text/css"/>
</head>
<body>
   <p dir="rtl">
      <span class="chapter">line1</span>
   </p>
   <p dir="rtl">&nbsp;&nbsp;<br />
   <span class="regular">line3.</span>
   <span class="italic">line4</span>
   <span class="regular">line5."</span>
   </p>
   <p dir="rtl">&nbsp;&nbsp;<br />
   <span class="regular">line6.</span>
   <br />
   <span class="regular">line7</span>
 </p>
 <p dir="rtl">&nbsp;&nbsp;<br />
   <span class="regular">line8.</span>
   <span class="regular">line9.</span>
 </p>
</body>
</html>


The output was:

<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="en"
version="-//W3C//DTD XHTML 1.1//EN">
   <head profile="">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      <title></title>
      <link href="test.css" rel="stylesheet" type="text/css"
xml:space="preserve" />
   </head>
   <body xml:space="preserve">
      <p dir="rtl" xml:space="preserve">
         <span class="chapter" xml:space="preserve">line1</span>

      </p>
      <p dir="rtl" xml:space="preserve">  <br xml:space="preserve" />
         <span class="regular" xml:space="preserve">line3.</span>
         <span class="italic" xml:space="preserve">line4</span>
         <span class="regular" xml:space="preserve">line5."</span>

      </p>
      <p dir="rtl" xml:space="preserve">  <br xml:space="preserve" />
         <span class="regular" xml:space="preserve">line6.</span>
         <br xml:space="preserve" />
         <span class="regular" xml:space="preserve">line7</span>
           <br xml:space="preserve" />
         <span class="regular" xml:space="preserve">line8.</span>
         <span class="regular" xml:space="preserve">line9.</span>

      </p>
   </body>
</html>


How can I remove the following:
1. extra xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" and
version="-//W3C//DTD XHTML 1.1//EN" inside html element.
2. extra profile="" in head element
3. extra xml:space="preserve" in p, span and br elements.

Thanks, Viente

On Sun, Jun 14, 2009 at 6:50 PM, Martin Honnen<Martin.Honnen@xxxxxx> wrote:
> Israel Viente wrote:
>
>> My input is something like the following:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>>    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>> <html xmlns="http://www.w3.org/1999/xhtml">
>> <body>
>>   <p dir="rtl">
>>      <span class="chapter">line1</span>
>>   </p>
>>   <p dir="rtl">&nbsp;&nbsp;<br />
>>   <span class="regular">line3.</span>
>>   <span class="italic">line4</span>
>>   <span class="regular">line5."</span>
>>   </p>
>>   <p dir="rtl">&nbsp;&nbsp;<br />
>>   <span class="regular">line6.</span>
>>   <br />
>>   <span class="regular">line7</span>
>>  </p>
>>  <p dir="rtl">&nbsp;&nbsp;<br />
>>   <span class="regular">line8.</span>
>>   <span class="regular">line9.</span>
>>  </p>
>> </body>
>> </html>
>>
>>
>> The reault output should be:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>>    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>> <html xmlns="http://www.w3.org/1999/xhtml">
>> <body>
>>   <p dir="rtl">
>>      <span class="chapter">line1</span>
>>   </p>
>>   <p dir="rtl">&nbsp;&nbsp;<br />
>>          <span class="regular">line3.</span>
>>          <span class="italic">line4</span>
>>          <span class="regular">line5."</span>
>>   </p>
>>   <p dir="rtl">&nbsp;&nbsp;<br />
>>          <span class="regular">line6.</span>
>>          <br />
>>          <span class="regular">line7</span>
>>          <span class="regular">line8.</span>
>>          <span class="regular">line9.</span>
>>   </p>
>> </body>
>> </html>
>>
>> For every span element that the class<>'chapter' verify that in every
>> p the last span element text ends with one character of .?"!
>> (paragraph ending char).
>> If it does, copy as is to the output.
>> Otherwise: Move the span elements from the next p to the current one
>> and remove the next p completely.
>
> Here is an attempt at solving that with XSLT 2.0:
>
> <xsl:stylesheet
>  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>  xpath-default-namespace="http://www.w3.org/1999/xhtml"
>  version="2.0">
>
>  <xsl:output method="xhtml"/>
>
>  <xsl:template match="@* | node()">
>    <xsl:copy>
>      <xsl:apply-templates select="@* | node()"/>
>    </xsl:copy>
>  </xsl:template>
>
>  <xsl:template match="p[span[@class ne 'chapter'] and
> not(matches(span[@class ne 'chapter'][last()], '[.?&quot;!]$'))]">
>    <xsl:copy>
>      <xsl:apply-templates select="@* | node() |
> following-sibling::p[1]/node()"/>
>    </xsl:copy>
>  </xsl:template>
>
>  <xsl:template match="p[preceding-sibling::p[1][span[@class ne 'chapter']
> and not(matches(span[@class ne 'chapter'][last()], '[.?&quot;!]$'))]]"/>
>
> </xsl:stylesheet>
>
> For the posted input using Saxon 9 it produces the described output but I
> have not tested with other inputs.
>
> --
>
>        Martin Honnen
>        http://msmvps.com/blogs/martin_honnen/


Current Thread
Keywords