Page 1 of 1

Can Special Characters Support decomposing or compose Unicode letters

Posted: Wed Dec 14, 2016 12:44 am
by michaelmh
Hi,

I would like to learn more about the Special Characters Support, the help is very short regarding this.

From another post I learned that COMBINING characters (U+0300 …) trigger this mode, or is it different? In my current situation I see some awkward behavior regarding Vietnamese characters. I use the translation of CAUTION as an example: THẬN TRỌNG

The 3rd letter of the 1st word is U+1EAC LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW
The 3rd letter of the 2nd word is U+1ECC LATIN CAPITAL LETTER O WITH DOT BELOW

If an XML file containing this text is opened WITH special characters support, those characters appear correctly and can be selected as a single character, but the Unicode values shown in the footer are:
U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+004F LATIN CAPITAL LETTER O.

If the same file is opened WITHOUT special characters support, those characters are separated into
U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+004F LATIN CAPITAL LETTER O
both followed by
U+0323 COMBINING DOT BELOW.
With some other combining accents one can see collisions of diacritical marks in this mode.

I guess the source file contains the "decomposed" characters with the COMBINING diacritics.

I would like to better understand what happens with the original data in files, when using Special Characters support, regarding the source file, the clipboard and during transformations applied to such characters. Also to know the precise character ranges may be useful.

BTW, my current problem is connected with a font which features the combined characters but not the COMBINING characters, so I would need the "composed" version.

Thanks,

- Michael

Re: Can Special Characters Support decomposing or compose Unicode letters

Posted: Wed Dec 14, 2016 3:13 pm
by Radu
Hi Michael,

So, a glyph is a symbol perceived by the end user:

https://en.wikipedia.org/wiki/Glyph

The glyph can be inserted in the XML content by the person which edits the XML in two ways:

1) Using a single character U+1EAC: Latin Capital Letter A With Circumflex And Dot Below. So this is a special unicode character which is rendered as the glyph above.
2) Using a consecutive combination of two characters: U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX followed immediately by the special combining character U+0323 COMBINING DOT BELOW.

If the character has been inserted using the (1) way in the XML content (using Oxygen's character map for example), Oxygen will see it as a regular character, use the specified font to render it and if the font has support for the character, it will be rendered as the glyph.
If the character has been inserted using the (2) way in the XML content, the Author visual editing mode should see it as a single character but when navigating using LEFT or RIGHT arrows over it there will be an extra position (because they are actually two characters which look like one glyph). When using or switching to the Text editing mode Oxygen will ask you to enable a special bidirectional mode which when enabled should properly show the character and allow left/right navigation through it in one step.

So about your questions:
I guess the source file contains the "decomposed" characters with the COMBINING diacritics.
It depends how the glyph was inserted in the first place, either as (1) or as (2). For the end user, looking at the published output this might not matter, they will see the same glyph if the font supports rendering both cases.
I would like to better understand what happens with the original data in files, when using Special Characters support, regarding the source file, the clipboard and during transformations applied to such characters. Also to know the precise character ranges may be useful.
When having case (2), the font is responsible for treating these 2 consecutive characters as one glyph. It might not be able to...
BTW, my current problem is connected with a font which features the combined characters but not the COMBINING characters, so I would need the "composed" version.
You could test inserting in an XML both forms of the glyph, see if the font can render both of them or not.

Regards,
Radu

Re: Can Special Characters Support decomposing or compose Unicode letters

Posted: Wed Dec 14, 2016 4:27 pm
by michaelmh
Hi Radu,

Thanks for the quick answer. I still need to do some investigation regarding the source data at hand.

But, could you please deliver more details (in the Online Help) to explain what OxygenXML does and what it doesn't do with and without Special Character support? The current description is quite short and vague (not even telling us what "special characters" are), but in times of I18N you have to deal with all sorts of scripts in our favorite editor.

Thanks,

- Michael

Re: Can Special Characters Support decomposing or compose Unicode letters

Posted: Thu Dec 15, 2016 11:17 am
by Radu
Hi Michael,

We have a small topic here which may help:

https://www.oxygenxml.com/doc/versions/ ... pport.html

Basically I think there are two types of glyphs which need special support in Oxygen, glyphs which are formed using combining characters and glyphs which have another orientation (right to left) so despite the fact that they are saved in a certain sequence, the rendering part need to show them by reversing the order.

Regards,
Radu