[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] pass malformed HTML through the parser?


Subject: Re: [xsl] pass malformed HTML through the parser?
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 13 Aug 2003 10:59:54 -0400

At 2003-08-13 15:36 +0200, Jaques, Yves (FIDI) wrote:
Our site is XML/XSL, however we are about to receive a thousand pages of old
static html that we will never have the time to turn into XHTML. I would
like to be able to spit it through the parser without parsing it so that I
can wrap our site template around the static html as I do for our other
content. Is this possible?

If you wanted to try and make well-formed XML out of the HTML, you might consider:


http://mercury.ccil.org/~cowan/XML/tagsoup/

But just to wrap your old HTML into an XML element as a static text stream you could write an application that:

  (1) converts - '&' to &amp;
               - '<' to &lt;
               - '>' to &gt;

(2) wraps the content in an element, for example:

               <oldhmtl>
                   &lt;p&gt;Hello world&lt;/p&gt;
               </oldhtml>

      and it will be a well-formed element of text properly escaped
      with the sensitive markup characters protected

I have tried the following:

-- using the document() function but the parser just ignored the file as it
was not XML.

Yes, that is the required behaviour of document() ... but if you follow my instructions above then the document() function would then work with the created file.


Note that once you get a text node of the old HTML you will then need to use disable-output-escaping="yes" in order to ensure the escaped characters in the input go out without escaping.

  <xsl:value-of select="document('test.xml')/oldhtml"
                disable-output-escaping="yes"/>

You should be able to take the above and change it to wrap it into your template as required.

I hope this helps.

............................ Ken

--
Instructor-led on-site corporate, government & user group training
for XSLT and XSL-FO world-wide; please contact us for the details;
Next public European delivery:  3-day XSLT/2-day XSL-FO 2003-09-22

G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
ISBN 0-13-065196-6                       Definitive XSLT and XPath
ISBN 0-13-140374-5                               Definitive XSL-FO
ISBN 1-894049-08-X   Practical Transformation Using XSLT and XPath
ISBN 1-894049-11-X               Practical Formatting Using XSL-FO
Member of the XML Guild of Practitioners:     http://XMLGuild.info
Male Breast Cancer Awareness  http://www.CraneSoftwrights.com/s/bc


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list




Current Thread
Keywords