Unicode characters in xml:id (or xsd:ID and xsd:Name in XML

<oXygen/> general issues.
ttasovac
Posts: 51
Joined: Fri Dec 19, 2003 6:02 pm

Unicode characters in xml:id (or xsd:ID and xsd:Name in XML

Post by ttasovac » Sun Jul 19, 2015 6:19 pm

Hi.

I am trying to understand why

Code: Select all

<entry xml:id="издꙑхание"></entry>
throws a validation error in my TEI documents in oXygen v. 17.0, under both XML 1.0 and 1.1

The offending character is ꙑ (U+A651, CYRILLIC SMALL LETTER YERU WITH BACK YER, which only looks similar but is not identical with ы) and which belongs to the Cyrillic Extended-B Range.

There is no doubt that in terms of Unicode, the above mentioned character, like any Cyrillic Extended-B range character is a perfectly normal LI character (Lowercase_Letter) character, whose Unicode properties also include ID_Start and ID_Continue. So I am pretty sure this should be a valid xml:id, at least under XML 1.1.

So, I am a bit puzzled as why I get this error in oXygen, even when my XML declaration says 1.1, which should be forward-compatible in terms of Unicode characters in xsd:ID and xsd:Name. Is it a specific oXygen issue, is it Saxon, is it Java?

How can I deal with this — other than mapping all Cyrillic B characters to something else in my xml:ids?

All best,
Toma

alex_jitianu
Posts: 770
Joined: Wed Nov 16, 2005 11:11 am

Re: Unicode characters in xml:id (or xsd:ID and xsd:Name in XML

Post by alex_jitianu » Mon Jul 20, 2015 11:57 am

Hello,

It looks like an issue in Jing (which Oxygen uses for validating RelaxNG). I've added an issue to investigate an we will let you know the conclusions.

Best regards,
Alex

Post Reply