Unicode characters in xml:id (or xsd:ID and xsd:Name in XML

Oxygen general issues.
ttasovac
Posts: 82
Joined: Fri Dec 19, 2003 6:02 pm

Unicode characters in xml:id (or xsd:ID and xsd:Name in XML

Post by ttasovac »

Hi.

I am trying to understand why

Code: Select all

<entry xml:id="издꙑхание"></entry>
throws a validation error in my TEI documents in oXygen v. 17.0, under both XML 1.0 and 1.1

The offending character is ꙑ (U+A651, CYRILLIC SMALL LETTER YERU WITH BACK YER, which only looks similar but is not identical with ы) and which belongs to the Cyrillic Extended-B Range.

There is no doubt that in terms of Unicode, the above mentioned character, like any Cyrillic Extended-B range character is a perfectly normal LI character (Lowercase_Letter) character, whose Unicode properties also include ID_Start and ID_Continue. So I am pretty sure this should be a valid xml:id, at least under XML 1.1.

So, I am a bit puzzled as why I get this error in oXygen, even when my XML declaration says 1.1, which should be forward-compatible in terms of Unicode characters in xsd:ID and xsd:Name. Is it a specific oXygen issue, is it Saxon, is it Java?

How can I deal with this — other than mapping all Cyrillic B characters to something else in my xml:ids?

All best,
Toma
alex_jitianu
Posts: 1008
Joined: Wed Nov 16, 2005 11:11 am

Re: Unicode characters in xml:id (or xsd:ID and xsd:Name in XML

Post by alex_jitianu »

Hello,

It looks like an issue in Jing (which Oxygen uses for validating RelaxNG). I've added an issue to investigate an we will let you know the conclusions.

Best regards,
Alex
Post Reply