Page 1 of 1

How to find special characters?

Posted: Mon Jul 22, 2019 8:12 pm
by fschmitt
One of the documents i'm editing in Oxygen 21.1 triggers the "Special characters detected" warning message. I would like to search those characters to check if they are "legitimate" document content, or if they are the result of a transformation problem (it's a converted docx, so it can contain a variety of ugly content... :roll: ).

So, is there a way (regex search) to detect all characters / unicode control codes that may trigger the "Special characters detected" message?

The only "foreign" characters i was able to identify in the document were some greek letters, but since greek doesn't require bidirectional text layout, i doubt if they are responsible for triggering the message.

Re: How to find special characters?

Posted: Tue Jul 23, 2019 10:02 am
by Radu
Hi,

I'm afraid we do not yet have a way in the application to signal what those complex characters are. Usually this issue is triggered when you have situations in which characters combine (the font may render one symbol for multiple characters). This will mean for example that when moving the cursor using the arrow keys special code will be triggered to properly jump over the combining characters as if they are one symbol.
Enabling the support for complex characters is usually associated to a slowdown when opening and editing the document.

There is an Oxygen GitHub project containing lots of sample plugins which you can download as a zip:

https://github.com/oxygenxml/wsaccess-j ... le-plugins

I just uploaded there a plugin folder called "determineComplexLayoutChars" which can be copied to the "OXYGEN_INSTALL_DIR\plugins" folder. After you start Oxygen the plugin will add a new contextual menu action when an XML document is opened in the Text editing mode. This new "Determine Complex Layout Chars" action should run a detection and then report all characters in the results view.

Regards,
Radu

Re: How to find special characters?

Posted: Tue Jul 23, 2019 6:31 pm
by fschmitt
Thanks a lot @Radu - the plugin works great and i was able to solve the issue with your help :D

Re: How to find special characters?

Posted: Sun Aug 01, 2021 10:37 pm
by patjporter
Hello, can you please provide specific instructions on how to download these files and install them on a Mac?
Thank you,
Patrick

Re: How to find special characters?

Posted: Mon Aug 02, 2021 6:37 am
by Radu
Hi Patrick,

Download a zip containing the entire project contents:
https://github.com/oxygenxml/wsaccess-j ... master.zip

Inside the zip there are folders, each folder is an Oxygen plugin.
Copy for example the folder "determineComplexLayoutChars" folder to the "OXYGEN_INSTALL_DIR\plugins" folder and then restart Oxygen.

Open an XML document in the text editing mode, right click inside it and there is a new menu item "Determine Complex Layout Chars".

Regards,
Radu

Re: How to find special characters?

Posted: Fri Oct 04, 2024 9:07 pm
by dcramer
The menu item appears, but selecting it has no effect for me in XML Editor 26.0, build 2023100905 on macOS Sequoia 15.0.
After installing the plugin and restarting Oxygen, I open a file that gives the warning about "The document ... contains bidirectional text (such as Arabic or Hebrew), South/South-Eastern Asian text, or special characters (such as combining characters) that require special handling to ensure proper editing."
When I right-click in text mode and select Determine Complex Layout Characters, nothing happens. Likewise if I select all the text or manually insert some bidi text.
Regards,
David

Re: How to find special characters?

Posted: Mon Oct 07, 2024 7:47 am
by Radu
Hi David,

This sample plugin that I created about 5 years ago used/uses a non-API Java code to test for such bidi characters. After we upgraded from Java 8 to newer Java versions, the newer Java versions reject by default using non-API Java code.
I just updated the Javascript code of the plugin to avoid using the non-API Java code, so maybe you can update your plugin with the new code:
https://github.com/oxygenxml/wsaccess-j ... sAccess.js

Regards,
Radu

Re: How to find special characters?

Posted: Mon Oct 07, 2024 9:04 am
by dcramer
Ok thanks! I also found them using Find in files and the following regular expression:

Code: Select all

[\p{M}\p{IsArabic}\p{IsHebrew}\p{IsHan}\p{IsHiragana}\p{IsKatakana}\p{IsHangul}\p{IsThai}\p{IsLao}\p{IsKhmer}]