XML Schema Oddity With Pattern Restrictions
Posted: Fri Oct 25, 2013 9:29 pm
I am using oXygen 15.1 editor. This question is not a problem with the editor. It is something odd with XML Schema 1.0.
I created an XML schema that contains a pattern restriction. I had tested it against a few XML parsers and XSLT processors including Saxon 9.5.1.2j and MSXML 4.0 SP3. With these two, my XML document successfully passes validation. I thought I was good to go.
However, I attempted to use my same XML document and schema with a SAX parser in Java. I am getting exceptions in my Java code:
org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.
Why would this work fine with the latest version of Saxon as well as MSXML? SAX is rejecting it as an invalid regular expression (the regex is valid though, mind you). It wants me to escape the '-' character.
Here is a sample XML document illustrating this oddity:
Here is a corresponding schema:
If you open these two in oXygen, you will see the same behavior I described. The XML document validates successfully against the schema.
However, if I do what SAX is telling me to do, which is escape the '-' character, the same XML document validates successfully again.
This seems like a a bug in something. SAX raises an error where neither Saxon nor MSXML do. When I make the change to avoid the SAX error, Saxon still validates the document without issue.
Is this a defect? If so, who is at fault to notify?
Thank you for reading.
I created an XML schema that contains a pattern restriction. I had tested it against a few XML parsers and XSLT processors including Saxon 9.5.1.2j and MSXML 4.0 SP3. With these two, my XML document successfully passes validation. I thought I was good to go.
However, I attempted to use my same XML document and schema with a SAX parser in Java. I am getting exceptions in my Java code:
org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.
Why would this work fine with the latest version of Saxon as well as MSXML? SAX is rejecting it as an invalid regular expression (the regex is valid though, mind you). It wants me to escape the '-' character.
Here is a sample XML document illustrating this oddity:
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<node xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="file:/D:/Data/OxygenXMLEditor/sample/sample.xsd">
<value>10a</value>
<value>-10b</value>
<value>2c</value>
<value>27d</value>
</node>
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="node">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="value"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="value">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[-]?\d+[abcd]"></xs:pattern>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
However, if I do what SAX is telling me to do, which is escape the '-' character, the same XML document validates successfully again.
This seems like a a bug in something. SAX raises an error where neither Saxon nor MSXML do. When I make the change to avoid the SAX error, Saxon still validates the document without issue.
Is this a defect? If so, who is at fault to notify?
Thank you for reading.