newbie to Schematron - help!

Questions about XML that are not covered by the other forums should go here.
Rob H
Posts: 1
Joined: Wed Nov 14, 2007 5:15 pm
Location: UK

newbie to Schematron - help!

Post by Rob H »

Hi

I am trying to use schematron to validate an xml file that looks something like this

<top>
<Marker>
<MarkerID>1</MarkerID>
<VariationType>test</VariationType>
<Source>dbtest</Source>
<LocalID>rs699</LocalID>
</Marker>

<Marker>
<MarkerID>2</MarkerID>
<VariationType>test</VariationType>
<Source>dbtest</Source>
<LocalID>rs698</LocalID>
</Marker>

<Marker>
<MarkerID>3</MarkerID>
<VariationType>test</VariationType>
<Source>dbtest</Source>
<LocalID>rs699</LocalID>
</Marker>
</top>

I need to be able to say if an rs number (e.g rs699 but its not always this!) (<LocalID>) occurs more than once in an xml document

so far after digging around i have managed to come up with this

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron">
<pattern fpi="LocalID">
<rule context="/top/Marker/LocalID">
<assert test="not(count(preceding-sibling::LocalID[.=current()]) = 1)">
Duplicate <value-of select ="."/>
key exists <value-of select="count(../LocalID[.=current()])"/> times.
Keys should be unique.
</assert>
</rule>
</pattern>

</schema>

which works but only it if the <LocalID> appears one or more times in a <Marker> block which is not what i want.

Any help / suggestions welcome as i am new to schematron

Thanks

Rob
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Post by sorin_ristache »

Hello,

This type of constraint is better expressed in W3C XML Schema than in Schematron using one of the elements xs:unique and xs:key of XML Schema. The XPath expression is much shorter in XML Schema and when you validate your XML document against the XML Schema the errors displayed in the Errors view point directly to the duplicate values so you can fix them quickly but in Schematron you can get only a list of the duplicate values. See the following Schematron 1.5 schema:

Code: Select all


<schema xmlns="http://www.ascc.net/xml/schematron">
<pattern name="checkID">
<rule context="/top">
<assert test="count(for $id in distinct-values(Marker/LocalID) return
(if (count(//LocalID[. = $id]) > 1) then $id else ())) = 0"
diagnostics="displayDuplicateIDs"/>
</rule>
</pattern>
<diagnostics>
<diagnostic id="displayDuplicateIDs">
Duplicate keys:
<value-of select="for $id in distinct-values(Marker/LocalID) return
(if (count(//LocalID[. = $id]) > 1) then $id else ())"/>
</diagnostic>
</diagnostics>
</schema>
You have to enable XPath 2.0 expressions in Schematron schemas from Options -> Preferences -> XML -> XML Parser, the Schematron section. An ISO Schematron schema (the namespace http://purl.oclc.org/dsdl/schematron) with such a complex XPath 2.0 expression cannot be used for validation of XML documents yet but we will fix that in a future version of oXygen.

Also see the following XML Schema which checks the same constraint with only an xs:unique element:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="top">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="Marker"/>
</xs:sequence>
</xs:complexType>
<xs:unique name="singleLocalID">
<xs:selector xpath="Marker/LocalID"/>
<xs:field xpath="."/>
</xs:unique>

</xs:element>
<xs:element name="Marker">
<xs:complexType>
<xs:sequence>
<xs:element ref="MarkerID"/>
<xs:element ref="VariationType"/>
<xs:choice maxOccurs="unbounded">
<xs:element ref="LocalID"/>
<xs:element ref="Source"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="MarkerID" type="xs:integer"/>
<xs:element name="VariationType" type="xs:NCName"/>
<xs:element name="LocalID" type="xs:NCName"/>
<xs:element name="Source" type="xs:NCName"/>
</xs:schema>
Regards,
Sorin
Post Reply