Validating regular expressions in schema 1.0 with oXygen
This should cover W3C XML Schema, Relax NG and DTD related problems.
-
- Posts: 43
- Joined: Tue Jul 26, 2016 6:31 pm
Validating regular expressions in schema 1.0 with oXygen
Hello all,
I'm creating a schema with a number of regular expressions restricting some of the datatypes.
It appears that the regex do work as I've tested them separately outside oXygen and they do what I'd expect.
oXygen generates test XML instances from the schema including some characters / combinations that shouldn't validate against those regex though.
Is this oXygen, the regex or the schema that's not stopping the bad data?
The basic token type with max 500 characters, minimum of one character and may not contain any of: the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, shall not begin or end with a space (#x20) character, or a sequence of two or more adjacent space characters:
>>> I want all the token type (above) restrictions to apply at once i.e. AND - can anyone confirm that's how the combination of maxLength, minLength and pattern will apply?
>>> It should not be necessary to split them out into multiple xs:restriction tags?
The transliteration type below should only allow ASCII characters.
>>> The above transliteration restriction should apply to the token type even if the token type is not functioning correctly, right?
Following element uses the transliteration type above:
The element's type is built as follows:
Here's the data I'm testing in oXygen:
>>> Why does the data in the element validate? The "ä" should cause it to fail...
I'm creating a schema with a number of regular expressions restricting some of the datatypes.
It appears that the regex do work as I've tested them separately outside oXygen and they do what I'd expect.
oXygen generates test XML instances from the schema including some characters / combinations that shouldn't validate against those regex though.
Is this oXygen, the regex or the schema that's not stopping the bad data?
The basic token type with max 500 characters, minimum of one character and may not contain any of: the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, shall not begin or end with a space (#x20) character, or a sequence of two or more adjacent space characters:
Code: Select all
<xs:simpleType name="Tokenized500Type">
<xs:restriction base="xs:string">
<xs:maxLength value="500"/>
<xs:minLength value="1"/>
<xs:pattern value="\S+( \S+)*"/>
</xs:restriction>
</xs:simpleType>
>>> It should not be necessary to split them out into multiple xs:restriction tags?
The transliteration type below should only allow ASCII characters.
Code: Select all
<xs:simpleType name="TransliteratedStringType">
<xs:annotation>
<xs:documentation> can only contain non-control characters drawn from the “invariant subset”
of ISO 646 (i.e. ASCII). </xs:documentation>
</xs:annotation>
<xs:restriction base="example:Tokenized500Type">
<xs:pattern
value="(!|"|%|&|'|\(|\)|\*|\+|,|-|.|\/|0|1|2|3|4|5|6|7|8|9|:|;|<|=|>|\?|A|B|C|D|E|F|G|H|I|J|K|L| |M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z|_|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)+"
/>
</xs:restriction>
</xs:simpleType>
Following element uses the transliteration type above:
The element's type is built as follows:
Code: Select all
<xs:complexType name="TransliteratedNameType">
<xs:simpleContent>
<xs:extension base="example:TransliteratedStringType">
<xs:attribute ref="xml:lang" use="optional">
<xs:annotation>
<xs:documentation>The language of this element's text content. An IETF Language Code
conforming to the latest RFC from IETF BCP 47. Note that the first characters of an
IETF Language Code, up to the hyphen (if any), are all lowercase, and those following
the hyphen (if any) are all uppercase.<br/>
</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
...
<xs:complexType name="TransliteratedOtherEntityNameType">
<xs:complexContent>
<xs:extension base="lei:TransliteratedNameType">
<xs:attribute name="type" type="lei:TransliteratedEntityNameTypeEnum" use="required">
<xs:annotation>
<xs:documentation>Type of alternative name for the legal entity.</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Code: Select all
<TransliteratedOtherEntityName type="AUTO_ASCII_TRANSLITERATED_LEGAL_NAME" xml:lang="de-DE">Dr. Bäcker</TransliteratedOtherEntityName>
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Validating regular expressions in schema 1.0 with oXygen
Hello,
BTW, "/" does not need escaping ("\/" in your pattern), Saxon-EE actually treats this unnecessary escaping as an error. I would remove it.
Regards,
Adrian
That is the correct way to combine a restriction of minimum/maximum length + pattern. No, you don't have to split it.>>> I want all the token type (above) restrictions to apply at once i.e. AND - can anyone confirm that's how the combination of maxLength, minLength and pattern will apply?
>>> It should not be necessary to split them out into multiple xs:restriction tags?
Yes, but that is not the case, the base type seems just fine.>>> The above transliteration restriction should apply to the token type even if the token type is not functioning correctly, right?Code: Select all
<xs:restriction base="example:Tokenized500Type">
<xs:pattern
value="(!|"|%|&|'|\(|\)|\*|\+|,|-|.|\/|0|1|2|3|4|5|6|7|8|9|:|;|<|=|>|\?|A|B|C|D|E|F|G|H|I|J|K|L| |M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z|_|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)+"
/>
</xs:restriction>
BTW, "/" does not need escaping ("\/" in your pattern), Saxon-EE actually treats this unnecessary escaping as an error. I would remove it.
You forgot to escape the "." in the pattern, so it matches anything, thus making the rest of the expression redundant.>>> Why does the data in the element validate? The "ä" should cause it to fail...
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service