Can Special Characters Support decomposing or compose Unicode letters
Posted: Wed Dec 14, 2016 12:44 am
Hi,
I would like to learn more about the Special Characters Support, the help is very short regarding this.
From another post I learned that COMBINING characters (U+0300 …) trigger this mode, or is it different? In my current situation I see some awkward behavior regarding Vietnamese characters. I use the translation of CAUTION as an example: THẬN TRỌNG
The 3rd letter of the 1st word is U+1EAC LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW
The 3rd letter of the 2nd word is U+1ECC LATIN CAPITAL LETTER O WITH DOT BELOW
If an XML file containing this text is opened WITH special characters support, those characters appear correctly and can be selected as a single character, but the Unicode values shown in the footer are:
U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+004F LATIN CAPITAL LETTER O.
If the same file is opened WITHOUT special characters support, those characters are separated into
U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+004F LATIN CAPITAL LETTER O
both followed by
U+0323 COMBINING DOT BELOW.
With some other combining accents one can see collisions of diacritical marks in this mode.
I guess the source file contains the "decomposed" characters with the COMBINING diacritics.
I would like to better understand what happens with the original data in files, when using Special Characters support, regarding the source file, the clipboard and during transformations applied to such characters. Also to know the precise character ranges may be useful.
BTW, my current problem is connected with a font which features the combined characters but not the COMBINING characters, so I would need the "composed" version.
Thanks,
- Michael
I would like to learn more about the Special Characters Support, the help is very short regarding this.
From another post I learned that COMBINING characters (U+0300 …) trigger this mode, or is it different? In my current situation I see some awkward behavior regarding Vietnamese characters. I use the translation of CAUTION as an example: THẬN TRỌNG
The 3rd letter of the 1st word is U+1EAC LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW
The 3rd letter of the 2nd word is U+1ECC LATIN CAPITAL LETTER O WITH DOT BELOW
If an XML file containing this text is opened WITH special characters support, those characters appear correctly and can be selected as a single character, but the Unicode values shown in the footer are:
U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+004F LATIN CAPITAL LETTER O.
If the same file is opened WITHOUT special characters support, those characters are separated into
U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+004F LATIN CAPITAL LETTER O
both followed by
U+0323 COMBINING DOT BELOW.
With some other combining accents one can see collisions of diacritical marks in this mode.
I guess the source file contains the "decomposed" characters with the COMBINING diacritics.
I would like to better understand what happens with the original data in files, when using Special Characters support, regarding the source file, the clipboard and during transformations applied to such characters. Also to know the precise character ranges may be useful.
BTW, my current problem is connected with a font which features the combined characters but not the COMBINING characters, so I would need the "composed" version.
Thanks,
- Michael