Page 1 of 1

XML Schema Oddity With Pattern Restrictions

Posted: Fri Oct 25, 2013 9:29 pm
by Jamil
I am using oXygen 15.1 editor. This question is not a problem with the editor. It is something odd with XML Schema 1.0.

I created an XML schema that contains a pattern restriction. I had tested it against a few XML parsers and XSLT processors including Saxon 9.5.1.2j and MSXML 4.0 SP3. With these two, my XML document successfully passes validation. I thought I was good to go.

However, I attempted to use my same XML document and schema with a SAX parser in Java. I am getting exceptions in my Java code:

org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.

Why would this work fine with the latest version of Saxon as well as MSXML? SAX is rejecting it as an invalid regular expression (the regex is valid though, mind you). It wants me to escape the '-' character.

Here is a sample XML document illustrating this oddity:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<node xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="file:/D:/Data/OxygenXMLEditor/sample/sample.xsd">
<value>10a</value>
<value>-10b</value>
<value>2c</value>
<value>27d</value>
</node>
Here is a corresponding schema:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="node">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="value"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="value">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[-]?\d+[abcd]"></xs:pattern>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
If you open these two in oXygen, you will see the same behavior I described. The XML document validates successfully against the schema.

However, if I do what SAX is telling me to do, which is escape the '-' character, the same XML document validates successfully again.

This seems like a a bug in something. SAX raises an error where neither Saxon nor MSXML do. When I make the change to avoid the SAX error, Saxon still validates the document without issue.

Is this a defect? If so, who is at fault to notify?

Thank you for reading.

Re: XML Schema Oddity With Pattern Restrictions

Posted: Wed Oct 30, 2013 4:05 pm
by Jamil
Here is more of the stack trace. I am using Apache Xerces in Java 1.6:

Code: Select all

Caused by: org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.error(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.reportSchemaErr(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.reportSchemaError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDAbstractTraverser.reportSchemaError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDSimpleTypeTraverser.getSimpleType(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDSimpleTypeTraverser.traverseLocal(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDElementTraverser.traverseNamedElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDElementTraverser.traverseGlobal(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.traverseSchemas(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.parseSchema(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadSchema(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory.newSchema(Unknown Source)
at javax.xml.validation.SchemaFactory.newSchema(Unknown Source)
at javax.xml.validation.SchemaFactory.newSchema(Unknown Source)

Re: XML Schema Oddity With Pattern Restrictions

Posted: Wed Oct 30, 2013 5:54 pm
by adrian
Hi,

The XML schema specification allows '-' in the character range ([-]):
http://www.w3.org/TR/xmlschema-2/#nt-charRange

Code: Select all


charRange          ::=   	seRange | XmlCharIncDash
XmlCharIncDash ::= [^\#x5B#x5D]
Please note the Xerces that is built into the Java runtime (com.sun.org.apache.xerces.internal.impl) is a rather old implementation. You might want to use the current Apache Xerces implementation for best results:
http://xerces.apache.org

Regards,
Adrian

Re: XML Schema Oddity With Pattern Restrictions

Posted: Thu Oct 31, 2013 1:36 am
by Jamil
Thank you for your follow-up. The issue with the Java runtime is noted, and I will look into the latest available Xerces.

Thanks again.