Page 1 of 1

Unicode characters in xml:id (or xsd:ID and xsd:Name in XML

Posted: Sun Jul 19, 2015 6:19 pm
by ttasovac
Hi.

I am trying to understand why

Code: Select all

<entry xml:id="издꙑхание"></entry>
throws a validation error in my TEI documents in oXygen v. 17.0, under both XML 1.0 and 1.1

The offending character is ꙑ (U+A651, CYRILLIC SMALL LETTER YERU WITH BACK YER, which only looks similar but is not identical with ы) and which belongs to the Cyrillic Extended-B Range.

There is no doubt that in terms of Unicode, the above mentioned character, like any Cyrillic Extended-B range character is a perfectly normal LI character (Lowercase_Letter) character, whose Unicode properties also include ID_Start and ID_Continue. So I am pretty sure this should be a valid xml:id, at least under XML 1.1.

So, I am a bit puzzled as why I get this error in oXygen, even when my XML declaration says 1.1, which should be forward-compatible in terms of Unicode characters in xsd:ID and xsd:Name. Is it a specific oXygen issue, is it Saxon, is it Java?

How can I deal with this — other than mapping all Cyrillic B characters to something else in my xml:ids?

All best,
Toma

Re: Unicode characters in xml:id (or xsd:ID and xsd:Name in XML

Posted: Mon Jul 20, 2015 11:57 am
by alex_jitianu
Hello,

It looks like an issue in Jing (which Oxygen uses for validating RelaxNG). I've added an issue to investigate an we will let you know the conclusions.

Best regards,
Alex