<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>oXygen XML Editor Blog</title>
</head>
<body>
<style type="text/css">

                        h1 a:hover {background-color:#888;color:#fff ! important;}

                        div#emailbody table#itemcontentlist tr td div ul {
                                        list-style-type:square;
                                        padding-left:1em;
                        }
        
                        div#emailbody table#itemcontentlist tr td div blockquote {
                                padding-left:6px;
                                border-left: 6px solid #dadada;
                                margin-left:1em;
                        }
        
                        div#emailbody table#itemcontentlist tr td div li {
                                margin-bottom:1em;
                                margin-left:1em;
                        }


                        table#itemcontentlist tr td a:link, table#itemcontentlist tr td a:visited, table#itemcontentlist tr td a:active, ul#summarylist li a {
                                color:#000099;
                                font-weight:bold;
                                text-decoration:none;
                        }       

                        img {border:none;}


                </style>
<div xmlns="http://www.w3.org/1999/xhtml" id="emailbody" style="margin:0 2em;font-family:Georgia,Helvetica,Arial,Sans-Serif;line-height:140%;font-size:13px;color:#000000;">
<table style="border:0;padding:0;margin:0;width:100%">
<tr>
<td style="vertical-align:top" width="99%">
<h1 style="margin:0;padding-bottom:6px;">
<a style="color:#888;font-size:22px;font-family:Arial, Helvetica, sans-serif;font-weight:normal;text-decoration:none;" href="http://blog.oxygenxml.com/" title="(http://blog.oxygenxml.com/)">[oXygen XML Editor Blog] - Batch converting HTML to XHTML</a>
</h1>
</td>
<td width="1%" />
</tr>
</table>
<hr style="border:1px solid #ccc;padding:0;margin:0" />
<table id="itemcontentlist">
<tr xmlns="">
<td style="margin-bottom:0;line-height:1.4em;">
<p style="margin:1em 0 3px 0;">
<a name="1" style="font-family:Arial, Helvetica, sans-serif;font-size:18px;" href="http://feedproxy.google.com/~r/AboutOxygenXmlEditor/~3/aF0B0Z1Zw1I/batch-converting-html-to-xhtml.html?utm_source=feedburner&utm_medium=email">Batch converting HTML to XHTML</a>
</p>
<p style="font-size:13px;color:#555;margin:9px 0 3px 0;font-family:Georgia,Helvetica,Arial,Sans-Serif;line-height:140%;font-size:13px;">
<span>Posted:</span> 12 Jun 2017 02:03 AM PDT</p>
<div style="margin:0;font-family:Georgia,Helvetica,Arial,Sans-Serif;line-height:140%;font-size:13px;color:#000000;"><div class="body">        <p class="p">Let's say you have a bunch of possible not-wellformed <strong class="ph b">HTML</strong> documents already             created and you want to process them using <strong class="ph b">XSLT</strong>. For example you may want to             migrate the <strong class="ph b">HTML</strong> to <strong class="ph b">DITA</strong> using the predefined <strong class="ph b">XHTML to DITA Topic</strong>            transformation scenario available in Oxygen. So you need to create valid XML wellformed                 <strong class="ph b">XHTML</strong> documents from the existing <strong class="ph b">HTML</strong> documents and you need to do             this in a batch processing automated fashion. </p>         <div class="p">There are lots of open source projects which deliver processors which can convert                 <strong class="ph b">HTML</strong> to its wellformed <strong class="ph b">XHTML</strong> equivalent. For this blog post we'll use                 <a class="xref" href="http://www.html-tidy.org/" target="_blank">HTML                 Tidy</a>. Here are a couple of steps to automate this process:<ol class="ol" id="batch_converting_html_to_xhtml__ol_qmj_mmz_21b">                <li class="li">Create a new folder on your hard drive (for example I created one on my                         <strong class="ph b">Desktop</strong>: <code class="ph codeph">C:\Users\radu_coravu\Desktop\tidy</code>) and                     download there the HTML Tidy executable specific for your platform: <a class="xref" href="http://binaries.html-tidy.org/" target="_blank">http://binaries.html-tidy.org/</a>.</li>                 <li class="li">In the same folder with the <strong class="ph b">Tidy</strong> executable create an <strong class="ph b">ANT</strong> build                     file called <code class="ph codeph">build.xml</code> having the following content:                     <pre class="pre codeblock language-xml"><strong class="hl-tag" style="color:#000096"><project</strong> <span class="hl-attribute" style="color: #ff7935">basedir</span>=<span class="hl-value" style="color: #993300">"."</span> <span class="hl-attribute" style="color: #ff7935">name</span>=<span class="hl-value" style="color: #993300">"TidyUpHTMLtoXHTML"</span> <span class="hl-attribute" style="color: #ff7935">default</span>=<span class="hl-value" style="color: #993300">"main"</span><strong class="hl-tag" style="color:#000096">></strong><br />    <strong class="hl-tag" style="color:#000096"><basename</strong> <span class="hl-attribute" style="color: #ff7935">property</span>=<span class="hl-value" style="color: #993300">"filename"</span> <span class="hl-attribute" style="color: #ff7935">file</span>=<span class="hl-value" style="color: #993300">"${file}"</span><strong class="hl-tag" style="color:#000096">/></strong><br />  <strong class="hl-tag" style="color:#000096"><target</strong> <span class="hl-attribute" style="color: #ff7935">name</span>=<span class="hl-value" style="color: #993300">"main"</span><strong class="hl-tag" style="color:#000096">></strong><br />      <strong class="hl-tag" style="color:#000096"><exec</strong> <span class="hl-attribute" style="color: #ff7935">command</span>=<span class="hl-value" style="color: #993300">"tidy.exe -o ${output.dir}/${filename} ${file}"</span><strong class="hl-tag" style="color:#000096">/></strong><br />  <strong class="hl-tag" style="color:#000096"></target></strong><br /><strong class="hl-tag" style="color:#000096"></project></strong></pre></li>                 <li class="li">Link in the Oxygen <strong class="ph b">Project</strong> view the entire folder where the original                         <strong class="ph b">HTML</strong> documents are located.</li>                 <li class="li">Right click the folder, choose <strong class="ph b">Transform->Configure Transformation                         Scenarios...</strong> and create a new transformation scenario of type <strong class="ph b">ANT                         Scenario</strong>. Modify the following properties in the transformation                         scenario:<ol class="ol" type="a" id="batch_converting_html_to_xhtml__ol_nxx_dnz_21b">                        <li class="li">Change the scenario name to something relevant like <strong class="ph b">HTML to                             XHTML</strong>.</li>                         <li class="li">Change the <strong class="ph b">Working Directory</strong> to point to the folder where the ANT                             build file is located, in my case:                                 <code class="ph codeph">C:\Users\radu_coravu\Desktop\tidy</code>.</li>                         <li class="li">Change the <strong class="ph b">Build file</strong> to point to your custom <strong class="ph b">build.xml</strong>,                             in my case:                             <code class="ph codeph">C:\Users\radu_coravu\Desktop\tidy\build.xml</code>.</li>                         <li class="li">In the <strong class="ph b">Parameters</strong> tab add a parameter called <strong class="ph b">file</strong> with                             value <strong class="ph b">${cf}</strong> and a parameter called <strong class="ph b">output.dir</strong> with value                             the path to the output folder where the equivalent XHTML files will be                             stored, in my case I set it to:                                 <code class="ph codeph">C:\Users\radu_coravu\Desktop\testOutputXHTML</code>.</li>                     </ol></li>                 <li class="li">Apply the newly transformation scenario on the entire folder containing the HTML                     documents. At the end in the output folder you will find the XHTML equivalents                     of the original HTML files, XHTML documents which can later be processed using                     XML technologies like <strong class="ph b">XSLT</strong> or <strong class="ph b">XQuery</strong>.</li>             </ol></div>     </div><img src="http://feeds.feedburner.com/~r/AboutOxygenXmlEditor/~4/aF0B0Z1Zw1I?utm_source=feedburner&utm_medium=email" height="1" width="1" alt=""/></div>
</td>
</tr>
</table>
<table style="border-top:1px solid #999;padding-top:4px;margin-top:1.5em;width:100%" id="footer">
<tr>
<td style="text-align:left;font-family:Helvetica,Arial,Sans-Serif;font-size:11px;margin:0 6px 1.2em 0;color:#333;">You are subscribed to email updates from <a href="http://blog.oxygenxml.com/">oXygen XML Editor Blog</a>.<br />To stop receiving these emails, you may <a href="https://feedburner.google.com/fb/a/mailunsubscribe?k=y_tRXtumvTurKTedh51JnlYsGXw">unsubscribe now</a>.</td>
<td style="font-family:Helvetica,Arial,Sans-Serif;font-size:11px;margin:0 6px 1.2em 0;color:#333;text-align:right;vertical-align:top">Email delivery powered by Google</td>
</tr>
<tr>
<td colspan="2" style="text-align:left;font-family:Helvetica,Arial,Sans-Serif;font-size:11px;margin:0 6px 1.2em 0;color:#333;">Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States</td>
</tr>
</table>
</div>
</body>
</html>