Unicode characters in xml:id (or xsd:ID and xsd:Name in XML
Posted: Sun Jul 19, 2015 6:19 pm
Hi.
I am trying to understand why throws a validation error in my TEI documents in oXygen v. 17.0, under both XML 1.0 and 1.1
The offending character is ꙑ (U+A651, CYRILLIC SMALL LETTER YERU WITH BACK YER, which only looks similar but is not identical with ы) and which belongs to the Cyrillic Extended-B Range.
There is no doubt that in terms of Unicode, the above mentioned character, like any Cyrillic Extended-B range character is a perfectly normal LI character (Lowercase_Letter) character, whose Unicode properties also include ID_Start and ID_Continue. So I am pretty sure this should be a valid xml:id, at least under XML 1.1.
So, I am a bit puzzled as why I get this error in oXygen, even when my XML declaration says 1.1, which should be forward-compatible in terms of Unicode characters in xsd:ID and xsd:Name. Is it a specific oXygen issue, is it Saxon, is it Java?
How can I deal with this — other than mapping all Cyrillic B characters to something else in my xml:ids?
All best,
Toma
I am trying to understand why
Code: Select all
<entry xml:id="издꙑхание"></entry>
The offending character is ꙑ (U+A651, CYRILLIC SMALL LETTER YERU WITH BACK YER, which only looks similar but is not identical with ы) and which belongs to the Cyrillic Extended-B Range.
There is no doubt that in terms of Unicode, the above mentioned character, like any Cyrillic Extended-B range character is a perfectly normal LI character (Lowercase_Letter) character, whose Unicode properties also include ID_Start and ID_Continue. So I am pretty sure this should be a valid xml:id, at least under XML 1.1.
So, I am a bit puzzled as why I get this error in oXygen, even when my XML declaration says 1.1, which should be forward-compatible in terms of Unicode characters in xsd:ID and xsd:Name. Is it a specific oXygen issue, is it Saxon, is it Java?
How can I deal with this — other than mapping all Cyrillic B characters to something else in my xml:ids?
All best,
Toma