[oXygen-user] [oXygen XML Editor Blog] - Batch converting HTML to XHTML

oXygen XML Editor Blog noreply+feedproxy at google.com
Mon Jun 12 23:51:04 CDT 2017


oXygen XML Editor Blog

///////////////////////////////////////////
Batch converting HTML to XHTML

Posted: 12 Jun 2017 02:03 AM PDT
http://feedproxy.google.com/~r/AboutOxygenXmlEditor/~3/aF0B0Z1Zw1I/batch-converting-html-to-xhtml.html?utm_source=feedburner&utm_medium=email

         Let's say you have a bunch of possible not-wellformed HTML  
documents already             created and you want to process them using  
XSLT. For example you may want to             migrate the HTML to DITA  
using the predefined XHTML to DITA Topic            transformation scenario  
available in Oxygen. So you need to create valid XML  
wellformed                 XHTML documents from the existing HTML documents  
and you need to do             this in a batch processing automated  
fashion.          There are lots of open source projects which deliver  
processors which can convert                 HTML to its wellformed XHTML  
equivalent. For this blog post we'll use                  
HTML                 Tidy. Here are a couple of steps to automate this  
process:                Create a new folder on your hard drive (for example  
I created one on my                         Desktop:  
C:\Users\radu_coravu\Desktop\tidy) and                     download there  
the HTML Tidy executable specific for your platform:  
http://binaries.html-tidy.org/.                 In the same folder with the  
Tidy executable create an ANT build                     file called  
build.xml having the following content:                     <project  
basedir="." name="TidyUpHTMLtoXHTML" default="main">
     <basename property="filename" file="${file}"/>
   <target name="main">
       <exec command="tidy.exe -o ${output.dir}/${filename} ${file}"/>
   </target>
</project>                 Link in the Oxygen Project view the entire  
folder where the original                         HTML documents are  
located.                 Right click the folder, choose  
Transform->Configure Transformation                         Scenarios...  
and create a new transformation scenario of type  
ANT                         Scenario. Modify the following properties in  
the transformation                         scenario:                         
Change the scenario name to something relevant like HTML  
to                             XHTML.                         Change the  
Working Directory to point to the folder where the  
ANT                             build file is located, in my  
case:                                  
C:\Users\radu_coravu\Desktop\tidy.                         Change the Build  
file to point to your custom build.xml,                             in my  
case:                              
C:\Users\radu_coravu\Desktop\tidy\build.xml.                         In the  
Parameters tab add a parameter called file with                              
value ${cf} and a parameter called output.dir with  
value                             the path to the output folder where the  
equivalent XHTML files will be                             stored, in my  
case I set it to:                                  
C:\Users\radu_coravu\Desktop\testOutputXHTML.                                       
Apply the newly transformation scenario on the entire folder containing the  
HTML                     documents. At the end in the output folder you  
will find the XHTML equivalents                     of the original HTML  
files, XHTML documents which can later be processed  
using                     XML technologies like XSLT or XQuery.

--
You are subscribed to email updates from "oXygen XML Editor Blog."
To stop receiving these emails, you may unsubscribe now:  
https://feedburner.google.com/fb/a/mailunsubscribe?k=y_tRXtumvTurKTedh51JnlYsGXw

Email delivery powered by Google.
Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, United  
States

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.oxygenxml.com/pipermail/oxygen-user/attachments/20170613/c14924e9/attachment.html>


More information about the oXygen-user mailing list