Oxygen XML Forum

Posted: **Fri Sep 27, 2024 3:31 pm**

Hello,
I'm using Oxygen XML Editor for creating htmlhelp (CHM) from dita source files. I am using a customized htmlhelp plugin which is based on the DITA-OT 2.5.4. It worked fine with Oxygen XML Editor 23.1. I have now installed Oxygen XML Editor 26.1, and I have modified my htmlhelp plugin so that it now works together with the latest DITA-OT included in Oxygen XML Editor 26.1. I can successfully build the CHM file and everything looks fine.

There is only one problem. My source dita files are in Finnish, so there are a lot of scandinavian characters. They are displayed correctly in the htmlhelp viewer, in the table of contents, and on the Index tab. But when I type a word containing scandinavian characters in the text field on the Search tab and press Enter, the result is "No topics found" even though there are such topics.

The generated html files now seem to have charset=utf-8, when previously with the older DITA-OT it was charset=iso-8859-1. Could this be the reason why the full-text search cannot find the scandinavian characters? Is there a way I could enforce my htmlhelp plugin to set the character encoding to iso-8859-1 and generate the html files according to that?

Any help would be much appreciated, thank you.

Posted: **Mon Sep 30, 2024 7:47 am**

Hello Tarja,

The generated html files now seem to have charset=utf-8, when previously with the older DITA-OT it was charset=iso-8859-1. Could this be the reason why the full-text search cannot find the scandinavian characters? Is there a way I could enforce my htmlhelp plugin to set the character encoding to iso-8859-1 and generate the html files according to that?

I think you are right about this. Oxygen has some patches made to the DITA Open Toolkit engine and one of those patches created a long time ago attempts to use UTF-8 for the generated HTML files in order to fix a problem with generating CHM containing Greek letters if I recall correctly. I added an internal issue to remove this patch as it seemed to also cause problems for some of our Chinese users.
A possible hackish workaround:
- Close Oxygen.
- If you install on your side a tool like 7-Zip, you can open in it the JAR library "OXYGEN_INSTALL_DIR/frameworks/dita/DITA-OT/plugins/com.oxygenxml.dost.patches/lib/oxygen-dost-patches.jar" and inside the JAR there is a file in the folder path "org/dita/dost/util/codepages.xml", remote the "codepages.xml" file and then save the JAR archive.
- Then start Oxygen and try to publish again.

Regards,
Radu

Posted: **Mon Sep 30, 2024 9:43 am**

Hello Radu,
This worked! The html files now have charset=iso-8859-1, the Scandinavian characters are displayed correctly and the full-text search finds them.
Thank you very much!

Posted: **Mon Sep 30, 2024 9:48 am**

HI Tarja,
Great, thanks for the feedback, the official fix will be included in the DITA OT bundled with Oxygen 27 (November this year).
Regards,
Radu

Posted: **Thu Nov 28, 2024 12:23 pm**

Hello,

Oxygen 27.0 is now available, in this version the encoding for Scandinavian characters has been fixed.

Regards,
Julien

Posted: **Thu Nov 28, 2024 12:31 pm**

Good to hear, thank you!

Oxygen XML Forum

How to enforce the encoding in the output html

How to enforce the encoding in the output html

Re: How to enforce the encoding in the output html

Re: How to enforce the encoding in the output html

Re: How to enforce the encoding in the output html

Re: How to enforce the encoding in the output html

Re: How to enforce the encoding in the output html