[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Non-xml source documents


Subject: Re: [xsl] Non-xml source documents
From: David Carlisle <davidc@xxxxxxxxx>
Date: Wed, 5 Jan 2005 15:28:00 GMT

   XSL is designed expressly for transforming XML documents. You won't
   have any luck in using it to transform something that isn't XML. I
   usually find that Perl is a very handy programming language for
   working with text documents and I have often used it to reformat
   non-XML documents into XML for further work.
   -- 
   Charles Knell
   cknell@xxxxxxxxxx - email



The OP said he could use XSLT2 which means that you can use the
unparsed-text()
function to get the input file as a string and then the fairly extensive
unicode-aware Regexp handling of XSLT2 to transform this to XML.
The text string handling still isn't up to perl's power, although offset
against that is the ease of integration of the XML generation of the
output that you get from xslt2.

I use this technique here
http://www.dcarlisle.demon.co.uk/htmlparse.xsl
that will read in html file (as plain text) and parse it using regexp
and produce an xhtml file (after applying some hueristics to fix up teh
element heirarchy)

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________


Current Thread
Keywords