WebHelp output does not handle encoded characters
Post here questions and problems related to editing and publishing DITA content.
-
- Posts: 15
- Joined: Wed Jan 08, 2014 3:23 pm
WebHelp output does not handle encoded characters
I have English documentation that include Greek letters and other symbols.
oXygenXML Author cannot display (by default) the symbolic names like α.
For this reason I entered all symbols as Unicode like α.
When converting the documentation to standard HTML (WebHelp scenario) these characters are correctly transcoded to the corresponding Unicode characters. But when creating a compiled help file (CHM) a question mark is placed for these characters in the generated (intermediate) HTML.
Not sure how to prevent this, since when I use the "correct" html with the hhp/hhk/hhc to compile the generated help file is showing the Unicode as well...
So for some reason the generation of the HTML by the webHelp scenario is different from the HTML generated by the CHM scenario.
Why is this?
oXygenXML Author cannot display (by default) the symbolic names like α.
For this reason I entered all symbols as Unicode like α.
When converting the documentation to standard HTML (WebHelp scenario) these characters are correctly transcoded to the corresponding Unicode characters. But when creating a compiled help file (CHM) a question mark is placed for these characters in the generated (intermediate) HTML.
Not sure how to prevent this, since when I use the "correct" html with the hhp/hhk/hhc to compile the generated help file is showing the Unicode as well...
So for some reason the generation of the HTML by the webHelp scenario is different from the HTML generated by the CHM scenario.
Why is this?
-
- Posts: 4141
- Joined: Fri Mar 28, 2003 2:12 pm
Re: WebHelp output does not handle encoded characters
Post by sorin_ristache »
Hello,
The difference between the Webhelp and Windows Help (CHM) transformations in this case is the encoding processing of the HTML files before feeding them to the Microsoft HTMLHelp Compiler. Some entity references like α are not processed correctly by the HTMLHelp Compiler, they need to be pre-processed and escaped before feeding them to the HTMLHelp Compiler. We fixed this problem by a patch which we contributed to the DITA-OT project some time ago. In more recent versions of DITA-OT (and more recent versions of Oxygen) this problem is already fixed.
What Oxygen version do you use? Can you try the CHM transformation also in the latest Oxygen version?
Regards,
Sorin
<oXygen/> XML Editor support
The difference between the Webhelp and Windows Help (CHM) transformations in this case is the encoding processing of the HTML files before feeding them to the Microsoft HTMLHelp Compiler. Some entity references like α are not processed correctly by the HTMLHelp Compiler, they need to be pre-processed and escaped before feeding them to the HTMLHelp Compiler. We fixed this problem by a patch which we contributed to the DITA-OT project some time ago. In more recent versions of DITA-OT (and more recent versions of Oxygen) this problem is already fixed.
What Oxygen version do you use? Can you try the CHM transformation also in the latest Oxygen version?
It depends on the default Author font set in the user preferences. Please set a default Author font (which means the font applied in Author mode when the CSS of your framework will set no explicit fonts) that is able to render that range of Unicode codepoints (that includes 945).Rob5624 wrote:I have English documentation that include Greek letters and other symbols.
oXygenXML Author cannot display (by default) the symbolic names like α.
For this reason I entered all symbols as Unicode like α.
Regards,
Sorin
<oXygen/> XML Editor support
-
- Posts: 15
- Joined: Wed Jan 08, 2014 3:23 pm
Re: WebHelp output does not handle encoded characters
Hi Sorin,
Thanks for your reply.
Your first question: I use "<oXygen/> XML Author 15.2, build 2014013017" fully licensed. I did purchase this version earlier this year, so not that old I expect... Included is DITA-OT version 1.7...
As far as the fonts are concerned:
In oXygenXML the code is entered like:
"<entry rowsep="1" colsep="1" align="center">α</entry>"
The WebHelp generates the HTML like:
"<td class="cellrowborder" align="center" valign="top" headers="d1631e97 d1631e100 d1631e103 d1631e106 d1631e109 d1631e113 d1631e116 d1631e119 d1631e122 d1631e125 d1631e128 d1631e132 ">°</td>" (copied using Wordpad, in fact IE and Windows Notepad shows correctly as "α")
The CHM scenario generates the HTML like:
"<td class="cellrowborder" align="center" valign="top" headers="d1549e97 d1549e100 d1549e103 d1549e106 d1549e109 d1549e113 d1549e116 d1549e119 d1549e122 d1549e125 d1549e128 d1549e132 ">?</td>" (again copied using Wordpad)
As you can see the HTML already does not contain the character! So this looks not like a font issue but a conversion issue...
Thanks for your reply.
Your first question: I use "<oXygen/> XML Author 15.2, build 2014013017" fully licensed. I did purchase this version earlier this year, so not that old I expect... Included is DITA-OT version 1.7...
As far as the fonts are concerned:
In oXygenXML the code is entered like:
"<entry rowsep="1" colsep="1" align="center">α</entry>"
The WebHelp generates the HTML like:
"<td class="cellrowborder" align="center" valign="top" headers="d1631e97 d1631e100 d1631e103 d1631e106 d1631e109 d1631e113 d1631e116 d1631e119 d1631e122 d1631e125 d1631e128 d1631e132 ">°</td>" (copied using Wordpad, in fact IE and Windows Notepad shows correctly as "α")
The CHM scenario generates the HTML like:
"<td class="cellrowborder" align="center" valign="top" headers="d1549e97 d1549e100 d1549e103 d1549e106 d1549e109 d1549e113 d1549e116 d1549e119 d1549e122 d1549e125 d1549e128 d1549e132 ">?</td>" (again copied using Wordpad)
As you can see the HTML already does not contain the character! So this looks not like a font issue but a conversion issue...
-
- Posts: 15
- Joined: Wed Jan 08, 2014 3:23 pm
Re: WebHelp output does not handle encoded characters
Hi Sorin,
Based on your suggestion I tried the following:
- I copied the complete oXygenXML installation from "C:\Program Files" to a different folder: "C:\My Program Files"
- I then extracted DITA-OT version 1.8.5 and copied it over the content in folder "C:\My Program Files\Oxygen XML Author 15\frameworks\dita\DITA-OT"
- I started "oxygenAuthor15.2.exe" in folder "C:\My Program Files\Oxygen XML Author 15"
The results are the same... From a previous problem I understood that the tool takes the "current location" when started for getting all... so I'm afraid that the DITA-OT version 1.8.5 did not solve it.
Based on your suggestion I tried the following:
- I copied the complete oXygenXML installation from "C:\Program Files" to a different folder: "C:\My Program Files"
- I then extracted DITA-OT version 1.8.5 and copied it over the content in folder "C:\My Program Files\Oxygen XML Author 15\frameworks\dita\DITA-OT"
- I started "oxygenAuthor15.2.exe" in folder "C:\My Program Files\Oxygen XML Author 15"
The results are the same... From a previous problem I understood that the tool takes the "current location" when started for getting all... so I'm afraid that the DITA-OT version 1.8.5 did not solve it.
-
- Posts: 15
- Joined: Wed Jan 08, 2014 3:23 pm
Re: WebHelp output does not handle encoded characters
Hi Sorin,
Just crossed my mind: not sure whether you got it, but previously I tried the following:
- created the WebbHelp HTML (with Unicode characters)
- created the CHM (with the question marks iso the Unicode characters)
- copied the three files "AuthoringToolSuite.hhc", "AuthoringToolSuite.hhk" and "AuthoringToolSuite.hhp" from the CHM output to the WebHelp output
- generated the CHM using the files in the WebHelp output using the latest "HTML Help Workshop" version 1.3
The CHM generated in this way displayed the Unicode characters correctly, so not sure why the CHM transcode these entities differently.
Just crossed my mind: not sure whether you got it, but previously I tried the following:
- created the WebbHelp HTML (with Unicode characters)
- created the CHM (with the question marks iso the Unicode characters)
- copied the three files "AuthoringToolSuite.hhc", "AuthoringToolSuite.hhk" and "AuthoringToolSuite.hhp" from the CHM output to the WebHelp output
- generated the CHM using the files in the WebHelp output using the latest "HTML Help Workshop" version 1.3
The CHM generated in this way displayed the Unicode characters correctly, so not sure why the CHM transcode these entities differently.
-
- Posts: 15
- Joined: Wed Jan 08, 2014 3:23 pm
Re: WebHelp output does not handle encoded characters
Hi Sorin,
Last but not least: I downloaded the latest version (<oXygen/> XML Author 16.0, build 2014070913) from your website and installed in evaluation mode.
With this version I once more performed the same conversion and now the CHM is also showing the correct Unicode characters, now the HTML contains the characters transcoded into "α" and so on...
Which is acceptable...
So not sure where in fact it is solved (apparently not in the standard DITA-OT but???)
Last but not least: I downloaded the latest version (<oXygen/> XML Author 16.0, build 2014070913) from your website and installed in evaluation mode.
With this version I once more performed the same conversion and now the CHM is also showing the correct Unicode characters, now the HTML contains the characters transcoded into "α" and so on...
Which is acceptable...
So not sure where in fact it is solved (apparently not in the standard DITA-OT but???)
-
- Posts: 15
- Joined: Wed Jan 08, 2014 3:23 pm
Re: WebHelp output does not handle encoded characters
Hi Sorin,
To conclude my testing I performed yet another test:
Using the copy of my 15.2 installation I removed the folder "Oxygen XML Author 15\frameworks\dita\DITA-OT" and copied the folder "Oxygen XML Author 16\frameworks\dita\DITA-OT" to that installation.
As expected: in this way the CHM also is generated with proper characters...
I guess I can do this with my official installation without infringing the license I have for version 15...
To conclude my testing I performed yet another test:
Using the copy of my 15.2 installation I removed the folder "Oxygen XML Author 15\frameworks\dita\DITA-OT" and copied the folder "Oxygen XML Author 16\frameworks\dita\DITA-OT" to that installation.
As expected: in this way the CHM also is generated with proper characters...
I guess I can do this with my official installation without infringing the license I have for version 15...
-
- Posts: 15
- Joined: Wed Jan 08, 2014 3:23 pm
Re: WebHelp output does not handle encoded characters
Hi Sorin,
I re-read you first post... and I checked the font set for Author (it was the default serif 16pt).
I changed this explicitly to "Arial 12pt" and re-ran the transformation... No help...
So as indicated the new "DITA-OT" embedded in version 16.0 solves the issue!
I re-read you first post... and I checked the font set for Author (it was the default serif 16pt).
I changed this explicitly to "Arial 12pt" and re-ran the transformation... No help...
So as indicated the new "DITA-OT" embedded in version 16.0 solves the issue!
-
- Posts: 4141
- Joined: Fri Mar 28, 2003 2:12 pm
Re: WebHelp output does not handle encoded characters
Post by sorin_ristache »
Hi,
Regards,
Sorin
<oXygen/> XML Editor support
Rob5624 wrote:Based on your suggestion I tried the following:
- I copied the complete oXygenXML installation from "C:\Program Files" to a different folder: "C:\My Program Files"
- I then extracted DITA-OT version 1.8.5 and copied it over the content in folder "C:\My Program Files\Oxygen XML Author 15\frameworks\dita\DITA-OT"
- I started "oxygenAuthor15.2.exe" in folder "C:\My Program Files\Oxygen XML Author 15"
The results are the same... From a previous problem I understood that the tool takes the "current location" when started for getting all... so I'm afraid that the DITA-OT version 1.8.5 did not solve it.
I confirm that the α HTML entity is not processed correctly in Oxygen 15.2 but it is processed correctly in Oxygen 16.0 although version 16.0 comes with DITA-OT 1.8.4 and you tried (without success) with DITA-OT 1.8.5. Actually the fix comes from one of our DITA-OT patches (that come in the Oxygen release kit but not yet in the DITA-OT release kit) that was not included in the DITA-OT 1.8.x branch releases, but apparently only in the DITA 2.0.x branch releases.Rob5624 wrote:I downloaded the latest version (<oXygen/> XML Author 16.0, build 2014070913) from your website and installed in evaluation mode.
With this version I once more performed the same conversion and now the CHM is also showing the correct Unicode characters, now the HTML contains the characters transcoded into "α" and so on...
Which is acceptable...
So not sure where in fact it is solved (apparently not in the standard DITA-OT but???)
Actually you are infringing the license terms as long as you use your Oxygen 15.2 installation for commercial purposes since you are using version 15.2 with a commercial license and part of version 16.0 with a trial license. Please contact us (by the online report form or by email) to give you a small patch file that fixes the problem in Oxygen 15.2 too so that you won't need to call or use any Oxygen 16.0 part.Rob5624 wrote:I guess I can do this with my official installation without infringing the license I have for version 15...
Regards,
Sorin
<oXygen/> XML Editor support
-
- Posts: 4141
- Joined: Fri Mar 28, 2003 2:12 pm
Re: WebHelp output does not handle encoded characters
Post by sorin_ristache »
There are two separate issues here. For fixing the issue with HTML entity processing for the DITA CHM output please see above, we can give you a patch file that you drop in the Oxygen 15.2 install directory.Rob5624 wrote:I re-read you first post... and I checked the font set for Author (it was the default serif 16pt).
I changed this explicitly to "Arial 12pt" and re-ran the transformation... No help...
So as indicated the new "DITA-OT" embedded in version 16.0 solves the issue!
The issue of rendering Unicode codepoints in Author editing mode is fixed with an appropriate font which you must set in either the CSS that drives Author document rendering or with a default font in the user preferences (to fall back on when no font is specified in the CSS stylesheet). This font setting takes effect in Author mode editing only and it will have no impact on the HTML file processing for the CHM output.
Regards,
Sorin
<oXygen> XML Editor Support
-
- Posts: 4141
- Joined: Fri Mar 28, 2003 2:12 pm
Re: WebHelp output does not handle encoded characters
Post by sorin_ristache »
Hi,
I've just sent you the patch file by email.
Regards,
Sorin
<oXygen> XML Editor Support
I've just sent you the patch file by email.
Regards,
Sorin
<oXygen> XML Editor Support
Return to “DITA (Editing and Publishing DITA Content)”
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ Artificial Intelligence (AI Positron Assistant add-on)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service