Page 1 of 1

How to enforce the encoding in the output html

Posted: Fri Sep 27, 2024 3:31 pm
by Tarja Koski
Hello,
I'm using Oxygen XML Editor for creating htmlhelp (CHM) from dita source files. I am using a customized htmlhelp plugin which is based on the DITA-OT 2.5.4. It worked fine with Oxygen XML Editor 23.1. I have now installed Oxygen XML Editor 26.1, and I have modified my htmlhelp plugin so that it now works together with the latest DITA-OT included in Oxygen XML Editor 26.1. I can successfully build the CHM file and everything looks fine.

There is only one problem. My source dita files are in Finnish, so there are a lot of scandinavian characters. They are displayed correctly in the htmlhelp viewer, in the table of contents, and on the Index tab. But when I type a word containing scandinavian characters in the text field on the Search tab and press Enter, the result is "No topics found" even though there are such topics.

The generated html files now seem to have charset=utf-8, when previously with the older DITA-OT it was charset=iso-8859-1. Could this be the reason why the full-text search cannot find the scandinavian characters? Is there a way I could enforce my htmlhelp plugin to set the character encoding to iso-8859-1 and generate the html files according to that?

Any help would be much appreciated, thank you.

Re: How to enforce the encoding in the output html

Posted: Mon Sep 30, 2024 7:47 am
by Radu
Hello Tarja,
The generated html files now seem to have charset=utf-8, when previously with the older DITA-OT it was charset=iso-8859-1. Could this be the reason why the full-text search cannot find the scandinavian characters? Is there a way I could enforce my htmlhelp plugin to set the character encoding to iso-8859-1 and generate the html files according to that?
I think you are right about this. Oxygen has some patches made to the DITA Open Toolkit engine and one of those patches created a long time ago attempts to use UTF-8 for the generated HTML files in order to fix a problem with generating CHM containing Greek letters if I recall correctly. I added an internal issue to remove this patch as it seemed to also cause problems for some of our Chinese users.
A possible hackish workaround:
- Close Oxygen.
- If you install on your side a tool like 7-Zip, you can open in it the JAR library "OXYGEN_INSTALL_DIR/frameworks/dita/DITA-OT/plugins/com.oxygenxml.dost.patches/lib/oxygen-dost-patches.jar" and inside the JAR there is a file in the folder path "org/dita/dost/util/codepages.xml", remote the "codepages.xml" file and then save the JAR archive.
- Then start Oxygen and try to publish again.

Regards,
Radu

Re: How to enforce the encoding in the output html

Posted: Mon Sep 30, 2024 9:43 am
by Tarja Koski
Hello Radu,
This worked! The html files now have charset=iso-8859-1, the Scandinavian characters are displayed correctly and the full-text search finds them.
Thank you very much!

Re: How to enforce the encoding in the output html

Posted: Mon Sep 30, 2024 9:48 am
by Radu
HI Tarja,
Great, thanks for the feedback, the official fix will be included in the DITA OT bundled with Oxygen 27 (November this year).
Regards,
Radu

Re: How to enforce the encoding in the output html

Posted: Thu Nov 28, 2024 12:23 pm
by julien_lacour
Hello,

Oxygen 27.0 is now available, in this version the encoding for Scandinavian characters has been fixed.

Regards,
Julien

Re: How to enforce the encoding in the output html

Posted: Thu Nov 28, 2024 12:31 pm
by Tarja Koski
Good to hear, thank you!