[XSL-LIST Mailing List Archive Home]
Converting poorly formed HTML into well-formed XML
Subject: Converting poorly formed HTML into well-formed XML|
From: Joseph Fourness <josephf@xxxxxxxxxxx>
Date: Tue, 26 Sep 2000 15:56:20 -0700
I am currently developing a system that converts arbitrary poorly formed
HTML into well formed XML (or XHTML).
Example of HTML:
<TD valign=TOP width="100">
<A href="http://www.mulberrytech.com" target=_top>Link</a>
The HTML has been written by various web developers over a period of time,
so it is very inconsistent in formatting, use of quotation marks in
I need to convert these files (approx. 120,000) into XHTML for usability
with an XSLT processor.
<td valign="top" width="100">
<a href="http://www.mulberrytech.com" target="_top">Link</a>
Does XSLT have the facilities to directly read in the poorly formed HTML?
And if so, what needs to be done.
Will designing a custom parser that builds a DOM from the poorly formed HTML
to then be output to an XML file, or directly processed by an XSLT document,
be the best solution.
I've already begun developing the latter (custom) solution, but thought I'd
double check to see if there are any HTML -> XHTML converters available.
Thanks in advance for your help,
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list