WebHelp output does not handle encoded characters

Post here questions and problems related to editing and publishing DITA content.
Rob5624
Posts: 15
Joined: Wed Jan 08, 2014 3:23 pm

WebHelp output does not handle encoded characters

Post by Rob5624 »

I have English documentation that include Greek letters and other symbols.
oXygenXML Author cannot display (by default) the symbolic names like α.
For this reason I entered all symbols as Unicode like α.
When converting the documentation to standard HTML (WebHelp scenario) these characters are correctly transcoded to the corresponding Unicode characters. But when creating a compiled help file (CHM) a question mark is placed for these characters in the generated (intermediate) HTML.

Not sure how to prevent this, since when I use the "correct" html with the hhp/hhk/hhc to compile the generated help file is showing the Unicode as well...

So for some reason the generation of the HTML by the webHelp scenario is different from the HTML generated by the CHM scenario.

Why is this?
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: WebHelp output does not handle encoded characters

Post by sorin_ristache »

Hello,

The difference between the Webhelp and Windows Help (CHM) transformations in this case is the encoding processing of the HTML files before feeding them to the Microsoft HTMLHelp Compiler. Some entity references like α are not processed correctly by the HTMLHelp Compiler, they need to be pre-processed and escaped before feeding them to the HTMLHelp Compiler. We fixed this problem by a patch which we contributed to the DITA-OT project some time ago. In more recent versions of DITA-OT (and more recent versions of Oxygen) this problem is already fixed.

What Oxygen version do you use? Can you try the CHM transformation also in the latest Oxygen version?
Rob5624 wrote:I have English documentation that include Greek letters and other symbols.
oXygenXML Author cannot display (by default) the symbolic names like α.
For this reason I entered all symbols as Unicode like α.
It depends on the default Author font set in the user preferences. Please set a default Author font (which means the font applied in Author mode when the CSS of your framework will set no explicit fonts) that is able to render that range of Unicode codepoints (that includes 945).


Regards,
Sorin

<oXygen/> XML Editor support
Rob5624
Posts: 15
Joined: Wed Jan 08, 2014 3:23 pm

Re: WebHelp output does not handle encoded characters

Post by Rob5624 »

Hi Sorin,

Thanks for your reply.
Your first question: I use "<oXygen/> XML Author 15.2, build 2014013017" fully licensed. I did purchase this version earlier this year, so not that old I expect... Included is DITA-OT version 1.7...

As far as the fonts are concerned:
In oXygenXML the code is entered like:
"<entry rowsep="1" colsep="1" align="center">&#945;</entry>"

The WebHelp generates the HTML like:
"<td class="cellrowborder" align="center" valign="top" headers="d1631e97 d1631e100 d1631e103 d1631e106 d1631e109 d1631e113 d1631e116 d1631e119 d1631e122 d1631e125 d1631e128 d1631e132 ">°</td>" (copied using Wordpad, in fact IE and Windows Notepad shows correctly as "α")

The CHM scenario generates the HTML like:
"<td class="cellrowborder" align="center" valign="top" headers="d1549e97 d1549e100 d1549e103 d1549e106 d1549e109 d1549e113 d1549e116 d1549e119 d1549e122 d1549e125 d1549e128 d1549e132 ">?</td>" (again copied using Wordpad)

As you can see the HTML already does not contain the character! So this looks not like a font issue but a conversion issue...
Rob5624
Posts: 15
Joined: Wed Jan 08, 2014 3:23 pm

Re: WebHelp output does not handle encoded characters

Post by Rob5624 »

Hi Sorin,

Based on your suggestion I tried the following:

- I copied the complete oXygenXML installation from "C:\Program Files" to a different folder: "C:\My Program Files"
- I then extracted DITA-OT version 1.8.5 and copied it over the content in folder "C:\My Program Files\Oxygen XML Author 15\frameworks\dita\DITA-OT"
- I started "oxygenAuthor15.2.exe" in folder "C:\My Program Files\Oxygen XML Author 15"

The results are the same... From a previous problem I understood that the tool takes the "current location" when started for getting all... so I'm afraid that the DITA-OT version 1.8.5 did not solve it.
Rob5624
Posts: 15
Joined: Wed Jan 08, 2014 3:23 pm

Re: WebHelp output does not handle encoded characters

Post by Rob5624 »

Hi Sorin,

Just crossed my mind: not sure whether you got it, but previously I tried the following:

- created the WebbHelp HTML (with Unicode characters)
- created the CHM (with the question marks iso the Unicode characters)
- copied the three files "AuthoringToolSuite.hhc", "AuthoringToolSuite.hhk" and "AuthoringToolSuite.hhp" from the CHM output to the WebHelp output
- generated the CHM using the files in the WebHelp output using the latest "HTML Help Workshop" version 1.3

The CHM generated in this way displayed the Unicode characters correctly, so not sure why the CHM transcode these entities differently.
Rob5624
Posts: 15
Joined: Wed Jan 08, 2014 3:23 pm

Re: WebHelp output does not handle encoded characters

Post by Rob5624 »

Hi Sorin,

Last but not least: I downloaded the latest version (<oXygen/> XML Author 16.0, build 2014070913) from your website and installed in evaluation mode.

With this version I once more performed the same conversion and now the CHM is also showing the correct Unicode characters, now the HTML contains the characters transcoded into "&alpha;" and so on...
Which is acceptable...

So not sure where in fact it is solved (apparently not in the standard DITA-OT but???)
Rob5624
Posts: 15
Joined: Wed Jan 08, 2014 3:23 pm

Re: WebHelp output does not handle encoded characters

Post by Rob5624 »

Hi Sorin,

To conclude my testing I performed yet another test:

Using the copy of my 15.2 installation I removed the folder "Oxygen XML Author 15\frameworks\dita\DITA-OT" and copied the folder "Oxygen XML Author 16\frameworks\dita\DITA-OT" to that installation.

As expected: in this way the CHM also is generated with proper characters...

I guess I can do this with my official installation without infringing the license I have for version 15...
Rob5624
Posts: 15
Joined: Wed Jan 08, 2014 3:23 pm

Re: WebHelp output does not handle encoded characters

Post by Rob5624 »

Hi Sorin,

I re-read you first post... and I checked the font set for Author (it was the default serif 16pt).
I changed this explicitly to "Arial 12pt" and re-ran the transformation... No help...

So as indicated the new "DITA-OT" embedded in version 16.0 solves the issue!
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: WebHelp output does not handle encoded characters

Post by sorin_ristache »

Hi,
Rob5624 wrote:Based on your suggestion I tried the following:

- I copied the complete oXygenXML installation from "C:\Program Files" to a different folder: "C:\My Program Files"
- I then extracted DITA-OT version 1.8.5 and copied it over the content in folder "C:\My Program Files\Oxygen XML Author 15\frameworks\dita\DITA-OT"
- I started "oxygenAuthor15.2.exe" in folder "C:\My Program Files\Oxygen XML Author 15"

The results are the same... From a previous problem I understood that the tool takes the "current location" when started for getting all... so I'm afraid that the DITA-OT version 1.8.5 did not solve it.
Rob5624 wrote:I downloaded the latest version (<oXygen/> XML Author 16.0, build 2014070913) from your website and installed in evaluation mode.

With this version I once more performed the same conversion and now the CHM is also showing the correct Unicode characters, now the HTML contains the characters transcoded into "&alpha;" and so on...
Which is acceptable...

So not sure where in fact it is solved (apparently not in the standard DITA-OT but???)
I confirm that the &alpha; HTML entity is not processed correctly in Oxygen 15.2 but it is processed correctly in Oxygen 16.0 although version 16.0 comes with DITA-OT 1.8.4 and you tried (without success) with DITA-OT 1.8.5. Actually the fix comes from one of our DITA-OT patches (that come in the Oxygen release kit but not yet in the DITA-OT release kit) that was not included in the DITA-OT 1.8.x branch releases, but apparently only in the DITA 2.0.x branch releases.
Rob5624 wrote:I guess I can do this with my official installation without infringing the license I have for version 15...
Actually you are infringing the license terms as long as you use your Oxygen 15.2 installation for commercial purposes since you are using version 15.2 with a commercial license and part of version 16.0 with a trial license. Please contact us (by the online report form or by email) to give you a small patch file that fixes the problem in Oxygen 15.2 too so that you won't need to call or use any Oxygen 16.0 part.


Regards,
Sorin

<oXygen/> XML Editor support
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: WebHelp output does not handle encoded characters

Post by sorin_ristache »

Rob5624 wrote:I re-read you first post... and I checked the font set for Author (it was the default serif 16pt).
I changed this explicitly to "Arial 12pt" and re-ran the transformation... No help...

So as indicated the new "DITA-OT" embedded in version 16.0 solves the issue!
There are two separate issues here. For fixing the issue with HTML entity processing for the DITA CHM output please see above, we can give you a patch file that you drop in the Oxygen 15.2 install directory.

The issue of rendering Unicode codepoints in Author editing mode is fixed with an appropriate font which you must set in either the CSS that drives Author document rendering or with a default font in the user preferences (to fall back on when no font is specified in the CSS stylesheet). This font setting takes effect in Author mode editing only and it will have no impact on the HTML file processing for the CHM output.


Regards,
Sorin

<oXygen> XML Editor Support
Rob5624
Posts: 15
Joined: Wed Jan 08, 2014 3:23 pm

Re: WebHelp output does not handle encoded characters

Post by Rob5624 »

Hi Sorin,

Thanks for your reply...
As such requested referring to this post...

With best regards!
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: WebHelp output does not handle encoded characters

Post by sorin_ristache »

Hi,

I've just sent you the patch file by email.


Regards,
Sorin

<oXygen> XML Editor Support
Post Reply