Spell check - Romanian diacritics (ș and ț)

Oxygen general issues.
llpick
Posts: 13
Joined: Sat Jun 23, 2018 12:52 am

Spell check - Romanian diacritics (ș and ț)

Post by llpick »

I've imported dictionaries containing Romanian words, but when they contain Romanian-specific characters with commas ș and ț, the spelling checker incorrectly suggests replacing them with Turkish characters with cedillas (ş and ţ).
Here's an example of each case:

The word 'Afișarea' (meaning "display" in English)
2023-04-12_10-43-24.png
2023-04-12_10-43-24.png (19.01 KiB) Viewed 598 times
The word 'Apăsați' (meaning "press" in English)
2023-04-12_10-43-24-2.png
2023-04-12_10-43-24-2.png (7.25 KiB) Viewed 598 times
From Wikipedia I gleaned that it might be an issue with ISO/CEI 8859-2 with substituted these characters because it didn't contain them. But now with the advances of Unicode my understanding is that it shouldn't be an issue.

Thanks for your time and help.
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: Spell check - Romanian diacritics (ș and ț)

Post by Radu »

Hi,

We (the developers of Oxygen) are Romanian:
https://www.oxygenxml.com/about_us.html
But we have not tested or used in any way the Romanian spell check dictionaries with Oxygen.
So you followed these steps to install new Hunspell dictionaries in Oxygen, is this correct?
https://www.oxygenxml.com/doc/versions/ ... onary.html
Are the dictionaries you added freely available online so that we can try them on our side? Or if not could you consider providing them to us (support@oxygenxml.com)?
If we can reproduce the problem on our side maybe we can look a bit at how those suggestions are read from the dictionaries file and interpreted.
There are indeed these two distinct characters which look similar when rendered:
U+0219: Latin Small Letter S With Comma Below
U+015F: Latin Small Letter S With Cedilla
Maybe the person who created the dictionary might have used one instead of the other in the dictionary file...
The dictionary is a text file so you can open it in Oxygen for example, when you place the caret in front of a character in Oxygen the Oxygen status bar shows the unicode equivalent for the character, in our case U+0219 or U+015F.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
llpick
Posts: 13
Joined: Sat Jun 23, 2018 12:52 am

Re: Spell check - Romanian diacritics (ș and ț)

Post by llpick »

Dear Radu, thank you for your swift and helpful answer. The issue is resolved.
The encoding of the dictionary file was at fault, which was revealed when I opened it in oXygen (Windows-1252 instead of UTF-8).
I could have figured it out on my own, and saved us all some time, much like I could have remembered where you're hailing from.
Have a great day.
Post Reply