[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] re: Generate identifier


Subject: Re: [xsl] re: Generate identifier
From: Liam R E Quin <liam@xxxxxx>
Date: Thu, 07 Jan 2010 15:17:26 -0500

On Thu, 2010-01-07 at 05:27 -0800, Vladimir Nesterovsky wrote:

> Is there a way to decompose characters like:
> C& 'LATIN SMALL LETTER AE' (U+00E6)
>
> into a separate letters?
> Are there many such characters derived from Latin (I'll be calling
> replace() if it's only one or two)?

The primary ones are OE and AE (and oe and ae) and I usually
special-case them, as you can turn them into either the two letters
or just an e, depending on whether you favour "mediaeval" or "medieval",
"foetus" or "fetus" and so on.

There are quite a few others, though, e.g. D2 E R$ R4 S(cyrillic) V
(armenian) W0 (werbeH), [   (Arabic), o, o, o, o, o,

A search for "ligature" in the Unicode database - or, e.g. in
Linux/Gnome, the character map utility, gucharmap - will find them.
For my purposes (e.g. making filenames and URIs from dictionary
headwords) I turn sequences of one or more non-letters into "-",
after handling accents and ligatures.

Liam

--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org


Current Thread
Keywords