[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: html to xml


Subject: Re: html to xml
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 27 Oct 2000 11:02:23 GMT

> So the conclusion
> is, I guess, "clean up the HTML minimally even before running tidy".
> I was afraid someone would say that. My problem is that the task is to
> convert our existing web pages (6196 documents, at last count) to (TEI DTD

I wasn't sure quite what your context was.
Surely grabbing floating PCDATA and sticking it in a paragraph element
is something easily done in the post tidy XSL transformation to TEI.

Grabbing html section heads into TEI/docbook style section containers is
always a pain but you can do it in XSL with the usual "grouping"
techniques. It's made a bit easier if you know that the H? elements all
appear in "correct" sequence, not jumping from h1 to h3. If you use
ISO-HTML DTD then the SGML parser (eg sx ) will add any missing section
levels automagically if you set the appropriate parameter entity.

David




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



Current Thread
Keywords
xsl