[oXygen-user] How to type an UTF8 symbol in text as well as in author mode
ben at adversary.org
Mon Feb 19 02:56:28 CST 2018
On Mon, Feb 19, 2018 at 09:33:28AM +0200, Oxygen XML Editor Support (Radu Coravu) wrote:
> Hi Bernhard,
> It seems that for "nbsp" which has the decimal equivalent "160" you would
> need to type "ALT" and then "0160", that leading "0" seems to be important.
> The same probably for all other characters, type their decimal equivalent
> but it needs to be four typed figures.
Oh, how quickly we forget certain things. :)
oXygen has had the ability to enter UTF-8 characters in the first
plane by their four character hexadecimal code point value since
version 17.1. I can't recall what the default hotkey is for invoking
it because I changed mine (back) to F8 as soon as I installed that
version. I believe I've still got the plugin you guys provided me
during my trial period for 17.0.
Anyway, if Bernhard is happy with using hex instead of int, that's the
solution instead of the Windows alt sequences (or the Mac alt/option
sequences either, for that matter).
Accessing characters in multiplanes beyond the first is difficult in
most programs, including oXygenXML. Obviously XML can handle it, but
the accessing problems are twofold:
1. Entering a hexadecimal character comprised of five or six hex
characters on the remaining 16 planes (i.e. 0x10000 to 0x1fffff).
2. Rendering characters which can only be displayed using multiple
fonts and guaranteeing font fallback capablities.
I have only one program which can handle both of these natively for
editing and that's GNU Emacs, but in those cases where I need to delve
into the upper multiplanes I can open a file from oXygen in Emacs and
that'll do for now.
It might be worth having a look at extending the hex entry feature to
enable a way to enter a hex value of grater than 3 bytes (4
characters), but oXygfen takes that input differently to other
programs and so it might be tricker. Emacs, LibreOffice and other
programs work by activating the hex input function (it's "M-x
insert-char" in Emacs) and then entering the code point hex value. In
oXygen you enter the hex value as four characters in the document and
then press the hotkey which reads the preceding four characters and
As for font fallback, there's pretty much no options for handling that
in oXygen, but there are effective workarounds by doing sneaky things
with CSS in the source files as well as the output formats.
I've got my own little Unicode cheat sheet which has been gradually
growing over the last decade or so and covers most of this in more
detail. Bear in mind two things: first, it's a personal cheat sheet
that I only share because it often answers frequent questions I hear
elsewhere; and second, it's a "living document" that gets updated
That said, it's here:
Or to download it:
It's only ever released as a PDF because of all the font/glyph
embedding. It claims or attempts to export as PDF/A-1, but only to
ensure that font embedding and it probably won't pass preflight
checks (nor does it need to).
For those few readers of this list who also use Emacs, the last three
pages of that file include those portions of my Emacs init file which
specify the fallback fonts using fontset default. I've got coverage
from 0x0000 to 0x2ffff and where things occasionally misbehave,
they're easy to identify with the aid of the binding on F16 (i.e. M-x
Finally, my current favourite code point checking tool, for any system
with Perl installed, is unum.pl, available here:
The current version of the cheat sheet discusses it on page 23, but
here's a nice example of what it does:
bash-4.4$ unum.pl 0x1f926
Octal Decimal Hex HTML Character Unicode
0374446 129318 0x1F926 🤦 "🤦" FACE PALM
Obviously some of us can see that character properly and some can't,
but you all know which it is.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 228 bytes
Desc: not available
More information about the oXygen-user