XML Schema Oddity With Pattern Restrictions
This should cover W3C XML Schema, Relax NG and DTD related problems.
-
- Posts: 99
- Joined: Thu Oct 23, 2008 6:29 am
XML Schema Oddity With Pattern Restrictions
I am using oXygen 15.1 editor. This question is not a problem with the editor. It is something odd with XML Schema 1.0.
I created an XML schema that contains a pattern restriction. I had tested it against a few XML parsers and XSLT processors including Saxon 9.5.1.2j and MSXML 4.0 SP3. With these two, my XML document successfully passes validation. I thought I was good to go.
However, I attempted to use my same XML document and schema with a SAX parser in Java. I am getting exceptions in my Java code:
org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.
Why would this work fine with the latest version of Saxon as well as MSXML? SAX is rejecting it as an invalid regular expression (the regex is valid though, mind you). It wants me to escape the '-' character.
Here is a sample XML document illustrating this oddity:
Here is a corresponding schema:
If you open these two in oXygen, you will see the same behavior I described. The XML document validates successfully against the schema.
However, if I do what SAX is telling me to do, which is escape the '-' character, the same XML document validates successfully again.
This seems like a a bug in something. SAX raises an error where neither Saxon nor MSXML do. When I make the change to avoid the SAX error, Saxon still validates the document without issue.
Is this a defect? If so, who is at fault to notify?
Thank you for reading.
I created an XML schema that contains a pattern restriction. I had tested it against a few XML parsers and XSLT processors including Saxon 9.5.1.2j and MSXML 4.0 SP3. With these two, my XML document successfully passes validation. I thought I was good to go.
However, I attempted to use my same XML document and schema with a SAX parser in Java. I am getting exceptions in my Java code:
org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.
Why would this work fine with the latest version of Saxon as well as MSXML? SAX is rejecting it as an invalid regular expression (the regex is valid though, mind you). It wants me to escape the '-' character.
Here is a sample XML document illustrating this oddity:
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<node xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="file:/D:/Data/OxygenXMLEditor/sample/sample.xsd">
<value>10a</value>
<value>-10b</value>
<value>2c</value>
<value>27d</value>
</node>
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="node">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="value"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="value">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[-]?\d+[abcd]"></xs:pattern>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
However, if I do what SAX is telling me to do, which is escape the '-' character, the same XML document validates successfully again.
This seems like a a bug in something. SAX raises an error where neither Saxon nor MSXML do. When I make the change to avoid the SAX error, Saxon still validates the document without issue.
Is this a defect? If so, who is at fault to notify?
Thank you for reading.
-
- Posts: 99
- Joined: Thu Oct 23, 2008 6:29 am
Re: XML Schema Oddity With Pattern Restrictions
Here is more of the stack trace. I am using Apache Xerces in Java 1.6:
Code: Select all
Caused by: org.xml.sax.SAXParseException: InvalidRegex: Pattern value '[-]?\d+[abcd]' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.' at column '1'.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.error(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.reportSchemaErr(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.reportSchemaError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDAbstractTraverser.reportSchemaError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDSimpleTypeTraverser.getSimpleType(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDSimpleTypeTraverser.traverseLocal(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDElementTraverser.traverseNamedElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDElementTraverser.traverseGlobal(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.traverseSchemas(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.traversers.XSDHandler.parseSchema(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadSchema(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory.newSchema(Unknown Source)
at javax.xml.validation.SchemaFactory.newSchema(Unknown Source)
at javax.xml.validation.SchemaFactory.newSchema(Unknown Source)
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: XML Schema Oddity With Pattern Restrictions
Hi,
The XML schema specification allows '-' in the character range ([-]):
http://www.w3.org/TR/xmlschema-2/#nt-charRange
Please note the Xerces that is built into the Java runtime (com.sun.org.apache.xerces.internal.impl) is a rather old implementation. You might want to use the current Apache Xerces implementation for best results:
http://xerces.apache.org
Regards,
Adrian
The XML schema specification allows '-' in the character range ([-]):
http://www.w3.org/TR/xmlschema-2/#nt-charRange
Code: Select all
charRange ::= seRange | XmlCharIncDash
XmlCharIncDash ::= [^\#x5B#x5D]
http://xerces.apache.org
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service