[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Getting the Base Character of Character with Diacritic


Subject: Re: [xsl] Getting the Base Character of Character with Diacritic
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Tue, 19 Sep 2006 10:56:54 +0200

Michael Kay wrote:
Following up on suggestions from others, if NFKD is supported then the
following should work reasonably well for European languages:

replace(normalize-unicode($in, 'NFKD'), '[&#x0300;-&#x036F;]', '')
You are right. In a way, I thought that the Modifier characters (x02B0 - x02FF) could also be used for "modifying" a certain character (I mentioned the macron and circumflex in an earlier post as 0x02C9 and 0x02C6, but these were wrong). They do include macron, diaeresis, circumflex etc. but as I understand now, these are not used for "modifying/combining letters" but for "modifying spacing" (i.e: quotes etc), and as a result do not influence normalization.

-- Abel Braaksma


Current Thread