XML Schema Oddity With Pattern Restrictions

This should cover W3C XML Schema, Relax NG and DTD related problems.
Jamil
Posts: 99
Joined: Thu Oct 23, 2008 6:29 am

XML Schema Oddity With Pattern Restrictions

Post by Jamil »

I am using oXygen 15.1 editor. This question is not a problem with the editor. It is something odd with XML Schema 1.0.

I created an XML schema that contains a pattern restriction. I had tested it against a few XML parsers and XSLT processors including Saxon 9.5.1.2j and MSXML 4.0 SP3. With these two, my XML document successfully passes validation. I thought I was good to go.

However, I attempted to use my same XML document and schema with a SAX parser in Java. I am getting exceptions in my Java code:

org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.

Why would this work fine with the latest version of Saxon as well as MSXML? SAX is rejecting it as an invalid regular expression (the regex is valid though, mind you). It wants me to escape the '-' character.

Here is a sample XML document illustrating this oddity:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<node xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="file:/D:/Data/OxygenXMLEditor/sample/sample.xsd">
<value>10a</value>
<value>-10b</value>
<value>2c</value>
<value>27d</value>
</node>
Here is a corresponding schema:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="node">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="value"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="value">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[-]?\d+[abcd]"></xs:pattern>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
If you open these two in oXygen, you will see the same behavior I described. The XML document validates successfully against the schema.

However, if I do what SAX is telling me to do, which is escape the '-' character, the same XML document validates successfully again.

This seems like a a bug in something. SAX raises an error where neither Saxon nor MSXML do. When I make the change to avoid the SAX error, Saxon still validates the document without issue.

Is this a defect? If so, who is at fault to notify?

Thank you for reading.
Jamil
Posts: 99
Joined: Thu Oct 23, 2008 6:29 am

Re: XML Schema Oddity With Pattern Restrictions

Post by Jamil »

Here is more of the stack trace. I am using Apache Xerces in Java 1.6:

Code: Select all

Caused by: org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.error(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.reportSchemaErr(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.reportSchemaError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDAbstractTraverser.reportSchemaError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDSimpleTypeTraverser.getSimpleType(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDSimpleTypeTraverser.traverseLocal(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDElementTraverser.traverseNamedElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDElementTraverser.traverseGlobal(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.traverseSchemas(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.parseSchema(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadSchema(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory.newSchema(Unknown Source)
at javax.xml.validation.SchemaFactory.newSchema(Unknown Source)
at javax.xml.validation.SchemaFactory.newSchema(Unknown Source)
adrian
Posts: 2879
Joined: Tue May 17, 2005 4:01 pm

Re: XML Schema Oddity With Pattern Restrictions

Post by adrian »

Hi,

The XML schema specification allows '-' in the character range ([-]):
http://www.w3.org/TR/xmlschema-2/#nt-charRange

Code: Select all


charRange          ::=   	seRange | XmlCharIncDash
XmlCharIncDash ::= [^\#x5B#x5D]
Please note the Xerces that is built into the Java runtime (com.sun.org.apache.xerces.internal.impl) is a rather old implementation. You might want to use the current Apache Xerces implementation for best results:
http://xerces.apache.org

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Jamil
Posts: 99
Joined: Thu Oct 23, 2008 6:29 am

Re: XML Schema Oddity With Pattern Restrictions

Post by Jamil »

Thank you for your follow-up. The issue with the Java runtime is noted, and I will look into the latest available Xerces.

Thanks again.
Post Reply