[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] feasibility of HTML input


Subject: Re: [xsl] feasibility of HTML input
From: Mike Ferrando <mikeferrando@xxxxxxxxx>
Date: Fri, 17 Mar 2006 11:03:06 -0800 (PST)

Dianne,
This is going to be real hacky, so get ready.

This suggestion uses XSLT 1.0.

Your HTML document must be on-line to do this.

Ok, you have some html that is messy and you want to capture it into
your xslt.

You could go to one of the Tidy on-line urls.

http://www.1-hit.com/all-in-one/tool.html-cleaner.htm

Use the site to clean up the HTML. Pick the options you want, etc.

Actuate the tidy function on the site. When you get the result, you
can capture the URL.

Cut the string before your HTML url. This string can then be used as
variable for your XSLT.

In your XSLT you can simply concat your document name to the string
(in your document function or previously).

This will give you a valid xml document to work with in your xslt.

Anyway, if you want (or can stand) to hear more, I can give you more
detail.

But it did work when I used it and saved me alot of time in the
process.

Mike Ferrando
Library Technician
Library of Congress
Washington, DC
202-707-4454



--- Jay Bryant <jay@xxxxxxxxxxxx> wrote:

> Hi, Dianne,
> 
> The only trick to using HTML as input to XSLT is that the HTML has
> to comply
> with the definition of well-formed XML. To do that, use one of the
> Tidy
> programs.
> 
> From reading the other responses, I see that you might also be able
> to get
> XML as your input. That will be MUCH more straightforward and very
> likely
> save you a bunch of time. Were I in similar straits, I would
> definitely go
> that route.
> 
> FWIW
> 
> Jay Bryant
> Bryant Communication Services
> 
> ----- Original Message ----- 
> From: <didoss@xxxxxxxxxxx>
> To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
> Sent: Friday, March 17, 2006 11:04 AM
> Subject: [xsl] feasibility of HTML input
> 
> 
> > I'm new to the list and to xsl and xslt.
> >
> > The goal of this e-mail is to just confirm the feasibility of my
> endeavor.
> It
> > would be a bonus if someone pushed me in a helpful direction - or
> I can
> keep
> > wandering, which is ok too.
> >
> > I haven't found much about the feasibility of using an html file
> as input.
> I
> > didn't find anything useful through Google searches, though being
> new to
> xsl and
> > xslt, I might have not entered the right phrase.  The 2 O'Reilly
> books
> that I
> > have also didn't clearly direct me towards a solution - but also
> didn't
> say that
> > it couldn't be done.
> >
> > Digging through the FAQ, here, I *did* finally find a couple
> references to
> using
> > HTML as input.  That at least gave me confidence that this is not
> a
> completely
> > insane idea.  I didn't get a clear idea of the requirements, but
> definitely
> > understood that I should TIDY my html before trying to parse it. 
> :)
> >
> > So, here I am thinking that it might be possible, but I have
> spent a bit
> of time
> > digging, and decided that I might want to check with the experts
> before
> spinning
> > wheels further.
> >
> > =========================================
> > Is this feasible,...worthwhile,...better done with another
> utility?
> > =========================================
> >
> > My team produces nightly JUnit reports and Emma coverage reports
> for our
> code.
> > I have added a task to copy off the top-level html pages for
> these results
> for
> > historical purposes.  I would like to be able to run a transform
> across
> the
> > files in the respective directory (one transform for JUnit and
> one for
> Emma) to
> > create summary files (probably comma delimited, to be able to
> pull into
> Excel).
> > The summary file could then be used to recognize and learn from
> trends in
> these
> > results.
> >
> > If this is feasible and worthwhile, and not better done with
> another
> utility, I
> > will send my current xsl and what I'm running into with it.
> >
> > Thanks for any advice and/or direction you can provide,
> > Dianne
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Current Thread
Keywords