The Schematron Assertion Language 1.5
2000-2002 (C) Rick Jelliffe
and
Academia Sinica Computing Centre
2002-10-01
Abstract
The
Schematron is a language system for specifying and declaring assertions about arbitrary
patterns in XML documents, based on the presence or absense, names and values of elements
and attributes along paths. Its target uses are for software engineering requiring document
validation, for scholarly research over patterns in graph-structured data, for automatic
creation of external markup, and to aid accessibility of documents for people with
disabilities.
Schemas
For markup languages, a
schema is a
specification of interlocking
constraints between
information items in a
document.
- Some schema languages are open (they allow anything by default and the schema
restricts what is allowed) and some are closed (they allow nothing by
default the schema allows specific structures). Schematron is open.
- Schematron is fully namespace-aware, and allows a very wide range of
wildcarding capabilities.
- Schematron is declarative and uses XML notation.
- Schematron is rule-based with a fixed four-layer hierarchy (phases, patterns,
rules, assertions).
- Schematron use path-expressions rather than grammars as its schematic
paradigm: the W3C Xpath language is used, with the various extensions
provided by XSLT.
- Schematron has been designed to be trivially implementable on XSLT. However
other implementation are possible. For example, an implementation in Perl has been
made.
Any schema paradigm imposes particular limitations on the constraints expressable by a
schema. Thus SGML declarations refer to its grammar declarations as 'content
models'; there is a difference between (the expectations we have of) a schema
versus a model: the former defines canonically or exhaustively, whilst the latter
describes as best it can according to its schematic paradigm.
[SGML] and [XML] have provided a large-scale example of a design-by-contract
[Meyer] distributed system: the DTD has acted as the language for specifying the
invariants of documents diring its life, but over-rideable (using parameter entities in
the internal subset) to allow particular process-local invariants to be described. It is
common practise in SGML shops to override and extend DTD ad hoc, in order to
better describe and verify (validate) the data at various stages. The ability to
flexibly alter DTDs as required seems to be one distinguishing feature of successful
SGML-based production.
Grammars
Conventionally, starting with SGML DTDs, schemas for markup languages are defined
in terms of grammars to regulate element containment, lists to regulate attribute
containment, augmented by datatype constraints on various information elements. In XML,
these additional constraints are concerned with enabling graph structures to be
represented, rather than describing the semantics or types of information
elements. In the late 1990s, many schema languages were developed for XML in
anticipation of the development of the World Wide Web Consortium's [XML Schema] schema
language. All these schema languages used the grammar-founded approach mentioned above,
elaborating on them using objects [SOX], modules [RELAX], production selectors
[Assertion Grammars], etc.
But these approaches have certain deficiencies. For a
start, SGML used grammars as its schematic paradigm because one does indeed define
grammars with it, down to lexical level: a full-featured SGML parser can parse many
tagged-data languages regardless of the delimiters used. The grammar paradigm is not
necessary for schemas for XML. Secondly, the grammar approach is not sufficient to
express any constraints between information items in different branches of the
attribute-value tree which forms the primary view of an XML document. The mechanisms for
declaring unique identifies and references do not alter this; mechanisms such as that of
[XML Schema] and [RELAX] to introduce grammar non-terminals (termed the tag/type
distinction) allow an element name to have a different datatype of content model
depending on its parent, however this merely allows the type of an element to be
constrained by its parent's type as well as by its generic identifier (i.e., by
its tag).
More generally, there is no reason to expect an arbitrary web of
information (think of an Entity-Relation diagram) to conform itself to a simple tree
structures. Consequently there is no reason to expect that the information constraining
one structure or value will always be found in the same branch. (Indeed, schema
languages themselves may in part be attempts to move non-branch-local information to a
separate tree, to simplify the markup, and structures).
These deficiencies are nothing more than schema paradigm limitations. However there are
other pragmatic and policy considerations that may make a grammar-based schema paradigm
unattractive.
Considerations
The first is a cultural, or perhaps
educational and linguistic one. According to [deFrancis], Western written languages can be
thought of as having the broad hierarchy 'character', 'word', 'phrase', 'sentence', while
Chinese can be thought of as having the broad hierarchy 'character', 'idea', 'phrase',
'sentence'. A Chinese character does more than a Western character, and represents
ideas which are again at a slightly higher-level than Western words (i.e., in
non-agglutinating languages.) Thus in Chinese, it is possible that one moves quickly
from a non-grammatic level to a semantic level fast; this may be why Taiwanese students are
anecdotally reported[1] to find the idea of formal grammars for natural languages to be
laughably theoretical rather than practical. At the least, we should not dismiss the
possibility that using formal grammars as a schematic paradigm may more easily acceptable to
members of one language group or culture than another.
Secondly, if a schematic paradigm
has language or cultural affinities, that different schema languages may be doubly
difficult for people with cognative impairments. I only need to take this point as far
as saying the obvious that a complex schema paradigm may be more difficult than a simple
one. And the difficulty of a schematic paradigm may be more than just its
complexity to explain but also its complexity to use and implement.
Third, when
considering the needs of schemas for constraints on documents which are needed to
support accessibility by disabled people, we come up against what I regard as a
fundamental shortfall in existing schema languages: they are designed to support
definitional schemas which intend to specifiy exhaustively or canonically the
required constraints on a document. However, acessibility constraints are typically
policy constraints imposed on a document in addition to those constraints required to
define that document. Yet these constraints are fundamentally schematic: they relate to
invariants about what elements, etc. can be used and where.
Fourthly, after
admitting that there can be important non-definitional constraints on a document, the
question arises of what other non-definitional constraints can there be? The main
one I identify is the requirements of workflow: that some constraints only may come into
existence during some phase of a document's lifecycle. Without some notion of
constraints that come into play during a phase, one must either weaken constraints on a
schema, until the schema only contains the loosest unparameterized invariants, or
arbitrarily switch schemas during the document's life cycle.
Fifthly, building on
the notion that it is useful to be able to switch constrains in and out during formally
defined phases of a document's life cycle, we can see that the ability to group and
switch in and out constraints on an ad hoc basis during editing of a document
would be useful. It is a common difficulty with validating editors of structured
documents that otiose errors are reported for documents under construction and
incomplete.
Sixthly, the other side to reporting extranous information is that a
grammar-based schema language probably does not have any mechanism to explain the
significance of particles and groups in its content model: if there is a repeating group
of elements, surely it is more interesting to know why they are grouped and repeat
rather than merely the bland fact that they are grouped and repeated. A group in a
content model represents a kind of manditorily omitted element, where the schema
designer has decided, perhaps for pragmatic reasons of markup minimisation, to not allow
the structure to be named as an element.
Seventhly, the previous issue raises
the further question of how information is to be provided for human interaction with a
schema system. In the case of grammars, it is possible to synthesize many useful error
massages or diagnostic hints from a content model; however grammar based systems have
seemed weak in helping sort out how to fix in-progress or utterly wrong markup.
Thus one consideration leads to the next, and the result can be considerable
doubt that grammars provide the optimal schema paradigm for documents for the World Wide
Web. These are some of the needs and considerations underpinning development of the
Schematron assertion language.
Which is not, of course, to say that grammars are
not quite useful when appropriate.
Uses
The Schematron language has been developed with four main use-areas in mind:
- For software engineering, to allow the expressing of interlocking constraints on
standalone XML documents, and for checking pre- and post-conditions of XML documents
used as arguments or returned by functions in programming languages.
- For data mining and scholarly use, where the dataset is a graph expressable by an
XML document, where the constraints are hypotheses about the dataset that can be
tested.
- For use in automated markup systems, where one wishes to detect patterns in data
and produce an external or inline document which links
- For use as a schema language for "hard" markup languages, such as XLink,
RDF and XSL-FO.
As part of the Schematron project, exemplar software to do these has been produced and
is available on the WWW[2].
A Schematron process does not augment the information set of
the document per se. Instead, it is assumed to create an external document
containing links or references (human or machine-readable) to the original document.
Assertions
A Schematron schema is made by specifying
assertions, which are
simple declarative sentences in natural language.
The <assert> element is used to
tag positive assertions about a document. For example,
<assert>A 'dog' element should contain two 'ear' elements.</assert>
This asssertion is something that is expected to be true of the document. If a document
is validated against the schema, and the test for this assertion fails, an application
can take some action. Schematron does not specify any actions: it only allows assertions
to be
tested, for the parts of assertions to be given
roles, for the
assertions to be grouped into
rules, for the rules to be grouped into
patterns, and for the patterns to be activated in various
phases.
The
<report> element is used to tag negative assertions about a document. For example,
<report>This dog has a bone.</report>
The test attribute on a <assert> or <report> element is an XPath
expression evaluated to boolean: informally, XPath expressions are a
simple expression language with functions on strings, numbers, booleans, document
context, and nodes. The terms can be grouped using parentheses and |. Formally,
they must match the production and semantics of production [14] Expr, s.3
Expressions in the [XPath] specification. (See Appendix D below for a combined listing
of the various productions.)
<assert test="self::dog and child::bone"
>A 'dog' element should contain two 'ear' elements.</assert>
Within these two elements, it is possible to use a <name> element, which gives the
specific name of the context element for which the assert statement failed or the
report statement succeeded. The <name> element can also have an attribute
path in which an [XPath] expression can be given; this allows the name of an
element or attribute different to the context element to be specified. Because some
implementations of Schematron may format these names differently. For better formatting,
an element <emph> is also allowed; its only use is to allow names of elements or
attributes to be specified in assertions to have the same format as those provided by
evaluating the <name> element. The <span> element is also allowed, with the
same meaning as in HTML.
<assert test="child::bone"
>A <name/> element should contain two <emph>ear</emph> elements.</assert>
Note that there is an abbreviated syntax possible for use in the test
attribute. So the following example is equivalent to the previous one:
<assert test="bone"
>A <name/> element should contain two <emph>ear</emph> elements.</assert>
For internationalization, the element <dir> can be used inside these two elements
to support bidirectional written languages; the semantics are those of the dir element
of [HTML]. The elements may also have an xml:dir attribute for tagging the written
language of the contents of the element; the xml:lang attribute does not express the
language of the target document.
For better formating of assertion reports, these
two elements may also have an icon attribute, which is the [URL] of a small image that
may provide visual clues to a user.
These two elements can also have a subject
attribute. This is an [XPath] path which allows very direct specification of the subject
of the assertion: this may be useful information for automatically generating [RDF]
documents.
There is no prescribed order in which assertions must be checked. (By default, most
implementations will probably check assertions in the same order they appear in the
document.)
In the particular case of Schematron schemas which need to be very terse, and which are
inteded for a yes/no validation result, the natural language assertions may be omitted.
Rules
<assert> and <report> elements are grouped inside <rule>
elements. The <rule> element has a context attribute which contains an [XPath]
expression. Every element in the document for which this path expression
evaluates to true is then used as the context to test the assertions. An assertion is tested
by testing an [XPath] expression declared in a test attribute of the <assert> and
<report> element.
The full declarations for the assertions above are
<rule context="dog">
<assert test="count(ear) = 2"
>A 'dog' element should contain two 'ear' elements.</assert>
<report test="bone"
>This dog has a bone.</report>
</rule>
The context attribute on a <rule> element is an XSLT pattern:
informally, this allows XPath path expressions to be combined in or groups
(using the keyword or and parentheses for grouping). Formally, they must
match the production and semantics of production [1] Pattern, s.5.2 Patterns
in the XSLT specification. (See Appendix D below for a combined listing of the various
productions.) The functions available include those in XSLT s.12 Additional Functions.
These can be extended using the methods in XSLT s14.2 Extension Methods; the
function-available() function should be used before any extension function
is called to allow some graceful behaviour on systems which do not support the
functions. (It is anticipated that some more business-oriented functions may be
developed at some later stage.)
These three elements are the operational core of
Schematron. [XPath] expressions allow a very wide range of constraints to be expressed:
based on element and attribute names, based on their position and occurrence, based on text
values, and based on counts. In the example, the context is every element with a
generic identifier 'dog': the test in the <assert> element counts the number of child
elements with the generic identifer 'ear'. Neither assertion in this rule will fail
for the following XML document:
<dog><ear/><ear/></dog>
The context attribute is an [XPath] as extended by [XSLT], allowing 'or' operations, for
example. The test attributes are [XPath] expressions which allow various logical operators
such as '|'.
The <rule>,<assert> and <report> elements can each have a
role attribute. This is an identifier within the schema to identify the role that is
played. Schematron 1.5 does not pre-define any roles; the <ns> attribute
on the <schema> element can be used to specify some URL to which this
controlled vocabulary belongs. These elements can also have id attributes.
<rule context="dog" role="animal" id="doggy">
<assert test="count(ear) = 2" role="internalProperty"
>A 'dog' element should contain two 'ear' elements.</assert>
<report test="bone" role="externalProperty"
>This dog has a bone.</report>
</rule>
This jointed-leg path system is reminiscent of SQL queries: one could consider a
query SELECT x FROM y WHERE z IS a to be a context statement (i.e., 'WHERE x IS
y') and a test (i.e., 'x FROM y).
A <rule> element can also contain
<key> elements, which allows [XSLT]'s key mechanism to be used. This allows
various testing of reference constraints; it is more powerful than the [XML] ID/IDREF
mechanism. The path attribute is an [XPath] path; the name attribute is a token naming
the key. The icon attribute allows specification of an icon.
An important feature
to note is that, because of [XSLT]'s document() function, a Schematron assertion test
can refer to data in a different document from the context document. This allows
Schematron schemas to be used for two important uses: to validate against a controlled
vocabulary located externally to the schema (indeed, this can be in any XML document
type, not just using a Schematron schema), and to validate the output of some programs
function against data found in its input (or vice versa) as a form of black-box
testing.
It is also useful to note that Schematron lends itself to analysis of
information sets using cohesion and coupling ideas [Constantine]. The coupling of
one information item to another often is not symetrical: DTDs force all coupling
contstraints to be expressed in terms of the parent to the child, whereas some coupling
may be better expressed from child to parent. This is a typical way of specifying
optional elements in a Schematron schema.
A simple macro mechanism is allowed on rules. A <rule> element can have one or more
<extends> elements. These have a rule attribute, which is the identifier
of an abstract rule. This allows you to bring the assertions of that anbstract rule
into the current rule. An abstract rule is specified with an abstract
attribute with a value "true". An abstract rule element cannot have a
context attribute. (This is use of "extends" where W3C XML
Schemas uses "restricts" is the appropraite terminology from rule-based
systems[WASH].
As an example, this constraint can be specified as follows (in [XPath} paths
<rule context="sch:rule">
<assert test="(attribute::abstract='true') and not(attribute::context)"
>An abstract rule cannot have a context attribute.</assert>
<assert test="(attribute::abstract='false') and attribute::context"
>A rule should have a context attribute (except for abstract rules.)</assert>
<assert test="not(attribute::abstract) and attribute::context"
>A rule should have a context attribute (except for abstract rules.)</assert>
<report test="attribute::abstract and not(attribute::abstract='true') and
not(attribute::abstract='false')"
>In a rule, the abstract attribute is optional, and can have values 'true'
or 'false'</report>
</rule>
Note in this example that Schematron schemas are very specific. It is quite probable
that a simpler schema would be just as effective, or the various assertions could be
combined into a larger test with a more general statement.
One
abstract rule can extend another. XML Entities can also be used for various macro
effects, as desired.
There is no prescribed order in which rules must be checked.
(By default, most implementations will probably check rules in the same order they
appear in the document.) Note, however, that if rules are checked in a different order,
they still must implement the order-dependent semantic that each context
attribute is a shortened form of the full context attribute, such that the context is
really formed by first testing negative of or-ing all previous contexts in the same
pattern; only nodes which pass that seive are tested by the rule.
Patterns
Rules are grouped into <pattern> elements. A pattern is a
grouping of rules. An element will only be used as the context of one rule per pattern; the
first rule in lexical order for which a context matches will be used.
Pattern elements
have various attributes. The name attribute allows specification of a simple
human-readable string to identify the pattern. The id attribute allows a unique
identifier to be assigned. for reference purposes. The fpi attribute allows an [SGML]
Formal Public Identifier to be attached. The see attribute allows a [URL] to be
specified which gives some human readable documentation for the pattern; a hypertext
presentation of the schema results can link to that resource.
A pattern
is the nearest equivalent in Schematron to a type; except that Schematron is
concerned with providing as direct as possible specification of the relationships
between information items rather than trying to fit them into an abstract mold such as
type. Which is not, of course, to say that notions of type are not quite useful where
appropriate.
A <pattern> element can have an icon element.
There is
no prescribed order in which patterns must be checked. (By default, most implementations
will probably check patterns in the same order they appear in the document, and
schema-writers may put important patterns before less important patterns in the document
to present the most useful information to the user.) Note, however, that if the
pattern is activated in the current phase, it should not be checked.
Schema
The top-level element of an XML schema is <schema>. A schema element
should have a <title> sub-element.
Typically the schema will be declared using XML
[Namespace] conventions. The preferred prefix is sch and the appropriate
namespace URI is
http://www.ascc.net/xml/schematron
Thus a complete XML
schema document is as follows:
<?xml version="1.0" encoding="US-ASCII"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:title>Example Schematron Schema</sch:title>
<sch:pattern>
<sch:rule context="dog">
<sch:assert test="count(ear) = 2"
>A 'dog' element should contain two 'ear' elements.</sch:assert>
<sch:report test="bone"
>This dog has a bone.</sch:report>
</sch:rule>
</sch:pattern>
</sch:schema>
The <schema> element can have a ns attribute which gives the namespace URI that
role attributes will have, if the role is used to externally mark up the target
document.
The <schema> element also allows explicit declaration of
namespace prefixes and URLs that are used in the schema, using the <ns>
subelements. The usual XML [Namespaces] mechanism can be used, however, then the prefix
and URL data is not available for diagnostic reporting or application processing;
furthermore, some implementations may require that the information is made available in
that form. For example:
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
<sch:title>Screen-scraper for XHTML data</sch:title>
<sch:ns prefix="xhtml" uri="http://www.w3.org/1999/xhtml" />
...
A <schema> can have an icon attribute. It can also contain <p>
elements, allowing some modest end-user-oriented documentation to be given: this allows
the user to know what kind of validation or constraints the schema specifies, to aid
them in interpreting any results usefully. The <p> element can have an icon
attribute.
Phases
Workflow and dynamical schemas are supported through the
phase mechanism. The <schema> element can contain <phase> elements. This must
have an id attribute for a unique identifier; it can have an icon attribute; it can have an
fpi attribute to give a persistant identifier for the phase, because a phase may correspond
to a DTD which had an FPI (note that the FPI is for the phase, not for the current scheme
per se.)
The <phase> element has subelements active which provide
the identifier of a <pattern> in an attribute pattern.
<phase id="basicValidation">
<active pattern="text" />
<active pattern="tables" />
<active pattern="attributePresence" />
</phase>
<phase id="fullValidation">
<active pattern="metadata" />
<active pattern="text" />
<active pattern="tables" />
<active pattern="attributePresence" />
<active pattern="attributeValueChecks" />
</phase>
By default, all patterns in a document are active. However, an application may
provide a way to allow the user to select the phase to be used: for example, a command
line option when invoked from the command line, a preferences dialog box in a GUI, or a
parameter on the function invocation when called as a precondition-checker in a
programming language (such as C's assert(), from which Schematron's
assert takes its name, or the pre and post-condition statements in Eiffel.)
Diagnostics
Users have reported that a common use of Schematron schemas is to allow specific
diagnostics to be given. However, it is desirable to keep <assert> and
<report> statements as general assertions rather than diagnostic messages. To
support this, the <assert> and <report> elements have a diagnostics
attribute which is a reference to one or more <diagnostic> elements. These are
allowed in a <diagnostics> section at the end of the document. The value of the
diagnostics attribute can be a list of references to <diagnostic> elements.
A <diagnostic> element is general text. It can trivially be converted into
HTML. It can contain <dir> (for bidirectional languages), <span> and
<emph> subelements with the same meanings as HTML. It must have an id attribute
to allow references to it. The <diagnostic> element can have
<value-of> sub-elements, which have the same semantics as in [XSLT]. These allow
insertion of value information as well as name details. A <diagnostic> element can
have an icon attribute.
<rule context="dog" >
<assert test="nose | @exceptional='true'" diagnostics="d1 d2"
>A dog should have a nose.</assert>
</rule>
...
<diagnostics>
<diagnostic id="d1"
>Your dog <value-of select="@petname" /> has no nose.
How does he smell? Terrible. Give him a nose element,
putting it after the <name path="child::*[2]"/> element.
</diagnostic>
<diagnostic id="d2"
>Animals such as <name/> usually come with a full complement
of legs, ears and noses, etc. However, exceptions are possible.
If your dog is exceptional, provide an attribute
<emph>exceptional='true'</emph>
</diagnostic>
</diagnostics>
There is no prescribed order in which diagnostics must be given. (By default, most
implementations will probably give diagnostics in the same order they appear in the
document, and schema-writers may try to use this by putting more likely diagnostics
before unlikely ones.)
Documentation
Because of the emphasis that the natural language text of an assert or
report element is the definition of an assertion, wi the tests being
models of the assertion, even undocumented Schematron schemas should be more
comprensible than other schema languages. The documentation features are designed to
extend these, and in particular to be able to generate pleasant print or hypertext
versions of a schema.
A <p> element is general text. It can be trivially converted into HTML. It can
contain <dir> (for bidirectional languages), <span> and <emph>
subelements with the same meanings as HTML. It can have an id attribute to allow
references to it. The optional class attribute is provided to help generation of
styled HTML, The xml:lang attribute can be used to specify the language of the
paragraph. A <p> element can have an icon attribute, which is not an HTML
attribute.
Schematron-Like Systems
Other useful validation languages can be built by merely using the Schematron framework
and substituting other query languages. For a language to be Schematron-like it
must be
- a rule-based system with asserts and reports;
- the asserts and reports are evaluated in a context provided by another query;
- the rules form an if-then-else chain, so that there is a lexical priority; and
- the rules are combined into a higher-level abstract, in Schematron called a
pattern.
A Schematron-like assertion language does not require backtracking or a theoretically
complex implementation. Other higher and lower layers may be added, for instance the
phase mechanism (which could in turn be generalized into a finite state machine.)
However, schema language designers should note that there seems to be good usability
reasons to stick with a fairly fixed hierarchy, rather than, say, adding an extra leg
between context and the assertions: for a start, it means that Schematron schemas can
be entered using simple forms.
Another distinctive that may make a language Schematron-like is the decision to
partition the query components to a separate, embedded query language. This seems to
have helped the readability, comprehesibility and learnability of the language; it adds
a terseness which makes hand-editing of schemas completely viable.
Schematron lends itself to being embedded in other schema languages. In such a use, an
extractor program (perhaps an XSLT stylesheet) typically extracts and creates the
separate Schematron schema. Even though only fragments of Schematron are being used,
such as just the assert elements, it is still appropriate to use the Schematron
namespace.
Related Material
For a relevant discussion on the role of primacy of natural language descriptions over
formal descriptions and the nature of declarative specifications, which are
surely applicable to schemas, see [LeCharlier]. Note their
comments that a specification should always have 1) a statement indicating purpose, and
2) a list of representation conventions that must be satisfied.
Schematron can be
considered a front-end for specifying the targets of a transformation system (see [CIP].) Indeed, Schematron also may be considered to split the
front-end into a rule-based framework (see [Schemarama] for an
implementation of this) and a query language (in Schematron's case, XPath.)
The
element name assert was chosen for familiarity to C programmers from the C
assert(). See ???
The XLinkIt language is a similar assertion
language to Schematron, but invented independently and with different usage goals and
design rationales. See [Finkelstein]. Note the existance of the Patent ???? which
relates to generating links between a schema document and an instance for consistancy
checking using a rules-based system. con>
A phase can be regarded as a state
in a Finite State Machine (see [Etessami] for a comparison of
rules-based and state-machine-based approaches.)
Appendix A: XML DTD For
Schematron 1.5
The following are markup declarations for the Schematron assertion
language. For clarity, this version used default namespace; it is inadvisable to use the
default namespace in practise, because some Schematron implementations may apply that
default namespace to the target document, to unqualified names. Note that, providing
the defaulting noted is followed and except for ID purposes, the Schematron DTD does not
make infoset contributions and should not be required.
<!-- +//IDN sinica.edu.tw//DTD Schematron 1.5//EN -->
<!-- http://www.ascc.net/xml/schematron/schematron1-5.dtd -->
<!-- version of 2002/08/16 -->
<!-- All names are given indirectly, to allow explicit use of a namespace prefix
if desired. In that case, in the internal subset of the doctype declaration,
define <!ENTITY % sp "sch:" >
-->
<!ENTITY % sp "">
<!ENTITY % schema "%sp;schema">
<!ENTITY % active "%sp;active">
<!ENTITY % assert "%sp;assert">
<!ENTITY % dir "%sp;dir">
<!ENTITY % emph "%sp;emph">
<!ENTITY % extends "%sp;extends">
<!ENTITY % diagnostic "%sp;diagnostic">
<!ENTITY % diagnostics "%sp;diagnostics">
<!ENTITY % key "%sp;key">
<!ENTITY % name "%sp;name">
<!ENTITY % ns "%sp;ns">
<!ENTITY % p "%sp;p">
<!ENTITY % pattern "%sp;pattern">
<!ENTITY % phase "%sp;phase">
<!ENTITY % report "%sp;report">
<!ENTITY % rule "%sp;rule">
<!ENTITY % span "%sp;span">
<!ENTITY % title "%sp;title">
<!ENTITY % value-of "%sp;value-of">
<!-- Data types -->
<!ENTITY % URI "CDATA">
<!ENTITY % PATH "CDATA">
<!ENTITY % EXPR "CDATA">
<!ENTITY % FPI "CDATA">
<!-- Element declarations -->
<!ELEMENT %schema; ((%title;)?, (%ns;)*, (%p;)*, (%phase;)*, (%pattern;)+, (%p;)*,
(%diagnostics;)?)>
<!ELEMENT %active; (#PCDATA | %dir; | %emph; | %span;)*>
<!ELEMENT %assert; (#PCDATA | %name; | %emph; | %dir; | %span;)*>
<!ELEMENT %dir; (#PCDATA)>
<!ELEMENT %emph; (#PCDATA)>
<!ELEMENT %extends; EMPTY>
<!ELEMENT %diagnostic; (#PCDATA | %value-of; | %emph; | %dir; | %span;)*>
<!ELEMENT %diagnostics; (%diagnostic;)*>
<!ELEMENT %key; EMPTY>
<!ELEMENT %name; EMPTY>
<!ELEMENT %ns; EMPTY>
<!ELEMENT %p; (#PCDATA | %dir; | %emph; | %span;)*>
<!ELEMENT %pattern; ((%p;)*, (%rule;)*)>
<!ELEMENT %phase; ((%p;)*, (%active;)*)>
<!ELEMENT %report; (#PCDATA | %name; | %emph; | %dir; | %span;)*>
<!ELEMENT %rule; (%assert; | %report; | %key; | %extends;)+>
<!ELEMENT %span; (#PCDATA)>
<!ELEMENT %title; (#PCDATA | %dir;)*>
<!ELEMENT %value-of; EMPTY>
<!-- Attribute declarations -->
<!ATTLIST %schema;
xmlns %URI; #FIXED "http://www.ascc.net/xml/schematron"
xmlns:sch %URI; #FIXED "http://www.ascc.net/xml/schematron"
xmlns:xsi %URI; #FIXED "http://www.w3.org/2000/10/XMLSchema-instance"
xsi:schemaLocation %URI; "http://www.ascc.net/xml/schematron
http://www.ascc.net/xml/schematron/schematron.xsd"
id ID #IMPLIED
fpi %FPI; #IMPLIED
schemaVersion CDATA #IMPLIED
defaultPhase IDREF #IMPLIED
icon %URI; #IMPLIED
version CDATA "1.5"
xml:lang NMTOKEN #IMPLIED
>
<!ATTLIST %active;
pattern IDREF #REQUIRED
>
<!ATTLIST %assert;
test %EXPR; #REQUIRED
role NMTOKEN #IMPLIED
id ID #IMPLIED
diagnostics IDREFS #IMPLIED
icon %URI; #IMPLIED
subject %PATH; #IMPLIED
xml:lang NMTOKEN #IMPLIED
>
<!ATTLIST %dir;
value (ltr | rtl) #IMPLIED
>
<!ATTLIST %extends;
rule IDREF #REQUIRED
>
<!ATTLIST %diagnostic;
id ID #REQUIRED
icon %URI; #IMPLIED
xml:lang NMTOKEN #IMPLIED
>
<!ATTLIST %key;
name NMTOKEN #REQUIRED
path %PATH; #REQUIRED
icon %URI; #IMPLIED
>
<!ATTLIST %name;
path %PATH; "."
>
<!-- Schematrons should implement '.'
as the default value for path in sch:name -->
<!ATTLIST %p;
xml:lang CDATA #IMPLIED
id ID #IMPLIED
class CDATA #IMPLIED
icon %URI; #IMPLIED
>
<!ATTLIST %pattern;
name CDATA #REQUIRED
see %URI; #IMPLIED
id ID #IMPLIED
icon %URI; #IMPLIED
>
<!ATTLIST %ns;
uri %URI; #REQUIRED
prefix NMTOKEN #IMPLIED
>
<!ATTLIST %phase;
id ID #REQUIRED
fpi %FPI; #IMPLIED
icon %URI; #IMPLIED
>
<!ATTLIST %span;
class CDATA #IMPLIED
>
<!ATTLIST %report;
test %EXPR; #REQUIRED
role NMTOKEN #IMPLIED
id ID #IMPLIED
diagnostics IDREFS #IMPLIED
icon %URI; #IMPLIED
subject %PATH; #IMPLIED
xml:lang NMTOKEN #IMPLIED
>
<!ATTLIST %rule;
context %PATH; #IMPLIED
abstract (true | false) "false"
role NMTOKEN #IMPLIED
id ID #IMPLIED
>
<!-- Schematrons should implement 'false' as the default
value of abstract -->
<!ATTLIST %value-of;
select %PATH; #REQUIRED
>
Appendix B: Schematron Schema for Schematron 1.5
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE schema PUBLIC "http://www.ascc.net/xml/schematron"
"http://www.ascc.net/xml/schematron/schematron1-5.dtd">
<schema xmlns="http://www.ascc.net/xml/schematron"
xmlns:sch="http://www.ascc.net/xml/schematron"
xml:lang="en"
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:schemaLocation="http://www.ascc.net/xml/schematron
http://www.ascc.net/xml/schematron/schematron1-5.xsd"
fpi="+//IDN ascc.net//SGML XML Schematron 1.5 Schema for Schematron 1.5//EN"
schemaVersion="2001/01/31" version="1.5"
defaultPhase="New"
icon="http://www.ascc.net/xml/resource/schematron/bilby.jpg">
<title>Schematron 1.5</title>
<ns uri="http://www.ascc.net/xml/schematron" prefix="sch"/>
<p>Copyright (C) 2001 Rick Jellife.
Freely and openly available under zlib/libpng license.</p>
<p>This schema is open: it only
considers elements in the Schematron namespce.
Elements and attributes from other namespaces can be used freely.
This schema does not assume that the Schematron schema is the top-level element.
</p>
<p>This schema uses conservative rules (e.g. no use of key()) to
work with incomplete XSLT-based implementations.</p>
<phase id="New">
<p>For creating new documents.</p>
<active pattern="mini"/>
</phase>
<phase id="Draft">
<p>For fast validation of draft documents.</p>
<active pattern="required" />
</phase>
<phase id="Full">
<p>For final validation and tracking some tricky problems.</p>
<active pattern="mini" />
<active pattern="required" />
<active pattern="attributes" />
</phase>
<pattern name="Minimal Schematron" id="mini">
<p>These rule establish the smallest possible Schematron document.
These rules may be handy for beginners with starting documents.</p>
<rule context="/">
<assert test="//sch:schema"
>A Schematron schema should have a schema element. </assert>
<report test="count(//sch:schema) > 1"
>There should only be one schema per document.</report>
<assert test="//sch:schema/sch:pattern "
>A Schematron schema should have pattern elements inside the
schema element</assert>
<assert test="//sch:schema/sch:pattern/sch:rule[@context]"
>A Schematron schema should have rule elements inside the pattern elements.
Rule elements should have a context attribute.</assert>
<assert test="//sch:schema/sch:pattern/sch:rule/sch:assert[@test]
or //sch:schema/sch:pattern/sch:rule/sch:report[@test]"
>A Schematron schema should have assert or report elements inside the rule elements.
Assert and report elements should have a test attribute.</assert>
</rule>
</pattern>
<pattern name="Schematron Elements and Required Attributes"
id="required">
<p>Rules defining occurrence rules for Schematron elements
and their required attributes. Note that for attributes,
it is not that the attribute is being tested for existance,
but whether it has a value.</p>
<p>Some elements require certain children or attributes.
Other elements require certain parents. Schematron can represent
both these kinds of coupling. </p>
<rule context="sch:schema">
<assert test="count(sch:*) =
count(sch:title|sch:ns|sch:phrase|sch:p|sch:pattern|sch:diagnostics|sch:phase)"
>The element <name/> should contain only the elements title, ns, phrase, p, pattern,
diagnostics or phase from the
Schematron namespace.</assert>
<assert test="sch:pattern"
>A schema element should contain at least one pattern element.</assert>
<report test="ancestor::sch:*"
>A Schematron schema should not appear as a child of another Schematron schema.
</report>
<report test="@defaultPhase and sch:phase and not(defaultPhase='#ALL') and
not(sch:phase[@id= current()/@defaultPhase])"
>The value of the defaultPhase attribute must match the id of a phase element.</report>
</rule>
<rule context="sch:title">
<assert test="parent::sch:schema" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element schema.
</assert>
<assert test="count(preceding-sibling::sch:*) = 0"
>The element <name/> should only appear as the first element from
the Schematron namespace
in the schema element.</assert>
</rule>
<rule context="sch:ns">
<assert test="parent::sch:schema" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element
schema.</assert>
<assert test="string-length(normalize-space(@prefix)) > 0"
>The element <name/> must have a value for the attribute prefix.</assert>
<assert test="string-length(normalize-space(@uri)) > 0"
>The element <name/> must have a value for the attribute uri.</assert>
<assert test="count(preceding-sibling::sch:*) = count(preceding-sibling::sch:title) +
count(preceding-sibling::sch:ns)"
>The <name/> element must come before any other Schematron elements, except
the title</assert>
<report test="*"
>The <name/> element should be empty.</report>
</rule>
<rule context="sch:phase">
<assert test="parent::sch:schema" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element schema.
</assert>
<assert test="count(preceding-sibling::sch:*) = count(preceding-sibling::sch:phase)
+ count(preceding-sibling::sch:title) + count(preceding-sibling::sch:ns)
+ count(preceding-sibling::sch:p)"
>The <name/> elements must come before any other Schematron elements,
except the title, ns and p elements</assert>
</rule>
<rule context="sch:active">
<assert test="parent::sch:phase" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element
phase.</assert>
<assert test="string-length(normalize-space(@pattern)) > 0"
>The element <name/> must have a value for the attribute pattern.</assert>
</rule>
<rule context="sch:pattern">
<assert test="parent::sch:schema" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element schema.
</assert>
<assert test="count(sch:*) = count(sch:rule|sch:p)"
>The element <name/> should contain only rule and p elements from the
Schematron namespace.</assert>
<assert test="sch:rule"
>The element <name/> should contain at least one rule element.</assert>
<assert test="string-length(normalize-space(@name)) > 0"
>The element <name/> must have a value for the attribute name.</assert>
<assert test="count(sch:title) < 2"
>A Schematron schema cannot have more than one title element.</assert>
</rule>
<rule context="sch:rule[@abstract='true']">
<assert test="parent::sch:pattern" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element
pattern.</assert>
<assert test="count(sch:*) = count(sch:assert |sch:report|sch:key|sch:extends ) "
>The element <name/> should contain only the elements assert, report,
key or extends from the Schematron namespace.</assert>
<assert test="sch:assert | sch:report | sch:extends"
>The element <name/> should contain at least one assert, report or
extends elements.</assert>
<report test="@test"
>The <name/> element cannot have a test attribute: that should go on a report
or assert element.</report>
<report test="@context"
>An abstract rule cannot have a context attribute.</report>
<assert test="string-length(normalize-space(@id)) > 0"
>An rule should have an id attribute. </assert>
</rule>
<rule context="sch:rule">
<assert test="parent::sch:pattern" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element pattern.
</assert>
<assert test="count(sch:*) = count(sch:assert |sch:report|sch:key|sch:extends ) "
>The element <name/> should contain only the elements assert, report, key or
extends from
the Schematron namespace.</assert>
<assert test="sch:assert | sch:report | sch:extends"
>The element <name/> should contain at least one assert, report or extends elements.
</assert>
<report test="@test"
>The <name/> element cannot have a test attribute: that should go on a report
or assert element.</report>
<assert test="string-length(normalize-space(@context)) > 0"
>A rule should have a context attribute. This should be an XSLT pattern for
selecting nodes to make
assertions and reports about. (Abstract rules do not require a context attribute.)</assert>
<assert test="not(@abstract) or (@abstract='false') or (@abstract='true')"
>In a rule, the abstract attribute is optional, and can have values 'true' or 'false'</assert>
</rule>
<rule context="sch:assert ">
<assert test="parent::sch:rule" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element rule.
</assert>
<assert test="string-length(normalize-space(text())) > 0"
>A <name/> element should contain a natural language sentence.</assert>
<assert test="string-length(normalize-space(@test)) > 0"
>The element <name/> must have a value for the attribute test. This should be an
XSLT expression.</assert>
<report test="@context"
>The <name/> element cannot have a context attribute: that should go
on the rule element.</report>
</rule>
<rule context=" sch:report">
<assert test="parent::sch:rule" diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron element rule.</assert>
<assert test="string-length(normalize-space(text())) > 0"
>A <name/> element should contain a natural language sentence.</assert>
<assert test="string-length(normalize-space(@test)) > 0"
>The element <name/> must have a value for the attribute test. This should be
an XSLT expression.</assert>
<report test="@context"
>The <name/> element cannot have a context attribute: that should go on
the rule element.</report>
</rule>
<rule context="sch:diagnostics">
<assert test="parent::sch:schema" diagnostics="bad-parent"
>The element <name/> should only appear as a child of the schema element</assert>
<report test="following-sibling::sch:*"
>The element <name/> should be the last element in the schema.</report>
</rule>
<rule context="sch:diagnostic">
<assert test="parent::sch:diagnostics" diagnostics="bad-parent"
>The element <name/> should only appear in the diagnostics section.</assert>
<assert test="string-length(normalize-space(@id)) > 0"
>The element <name/> must have a value for the attribute id. </assert>
</rule>
<rule context="sch:key">
<assert test="parent::sch:rule" diagnostics="bad-parent"
>The element <name/> should only appear in a rule.</assert>
<assert test="string-length(normalize-space(@name)) > 0"
>The element <name/> must have a value for the attribute name. </assert>
<assert test="string-length(normalize-space(@path)) > 0"
>The element <name/> must have a value for the attribute path.
This should be an XPath expression.</assert>
<report test="*"
>The <name/> element should be empty.</report>
</rule>
<rule context="sch:extends">
<assert test="parent::sch:rule" diagnostics="bad-parent"
>The element <name/> should only appear in a rule.</assert>
<assert test="string-length(normalize-space(@rule)) > 0"
>The element <name/> must have a value for the attribute rule. </assert>
<report test="*"
>The <name/> element should be empty.</report>
<assert test="/*//sch:rule[@abstract='true'][@id = current()/@rule]"
>The <name/> element should have an attribute rule which gives
the id of an abstract rule.</assert>
</rule>
<rule context="sch:p">
<assert test="parent::sch:*" diagnostics="bad-parent"
>The element <name/> should only appear inside an element from
the Schematron namespace.
It is equivalent to the HTML element of the same name.</assert>
</rule>
<rule context="sch:name">
<assert test="parent::sch:assert | parent::sch:report |parent::sch:p | parent::sch:diagnostic"
diagnostics="bad-parent"
>The element <name/> should only appear inside a Schematron
elements p (paragraph) or diagnostic.</assert>
<report test="*"
>The <name/> element should be empty.</report>
</rule>
<rule context="sch:emph">
<assert test="parent::sch:p | parent::sch:diagnostic"
diagnostics="bad-parent"
>The element <name/> should only appear inside a Schematron
elements p (paragraph) or diagnostic.
It is equivalent to the HTML element of the same name.</assert>
</rule>
<rule context="sch:dir">
<assert test="parent::sch:p | parent::sch:diagnostic"
diagnostics="bad-parent"
>The element <name/> should only appear inside a Schematron
elements p (paragraph) or diagnostic.</assert>
<assert test="@value and (@value='rtl' or @value='ltr')"
>The attribute value of the <name/> element must be lowercase
"rtl" or "ltr".
It is equivalent to the HTML element of the same name.</assert>
</rule>
<rule context="sch:span">
<assert test="parent::sch:p | parent::sch:diagnostic"
diagnostics="bad-parent"
>The element <name/> should only appear inside a Schematron
elements p (paragraph) or diagnostic.
It is equivalent to the HTML element of the same name.</assert>
</rule>
<rule context="sch:value-of">
<assert test="parent::sch:diagnostic"
diagnostics="bad-parent"
>The element <name/> should only appear inside the Schematron
element diagnostic.</assert>
<assert test="string-length(normalize-space(@select)) > 0"
>The element <name/> must have a value for the attribute select.
The value should be an XPath expression.</assert>
<report test="*"
>The <name/> element should be empty.</report>
</rule>
<rule context="sch:*">
<report test="1=1" diagnostics="spelling"
>The <name/> element is not an element from the Schematron
1.5 namespace</report>
</rule>
</pattern>
<pattern name="Schematron Attributes" id="attributes" >
<p>These rules specify which elements each attribute can belong to, and
what they mean.</p>
<rule context="sch:*">
<report test="@abstract and not(self::sch:rule)"
>The boolean attribute abstract can only appear on the element rule.
An abstract rule can be used to extend other rules.</report>
<report test="@class and not(self::sch:span or self::sch:p)"
>The attribute class can only appear on the elements span and p.
It gives a name that can be used by CSS stylesheets.</report>
<report test="@context and not(self::sch:rule)"
>The attribute context can only appear on the element rule. It is
an XPath pattern.</report>
<report test="@defaultPhase and not(self::sch:schema)"
>The attribute defaultPhase can only appear on the element schema.
It is the id of the phase that will initially be active.</report>
<report test="@diagnostics and not(self::sch:assert or self::sch:report)"
>The attribute diagnostics can only appear on the elements report and report.
It is the id of some relevent diagnostic or hint.</report>
<report test="@fpi and not(self::sch:schema or self::sch:phase)"
>The attribute fpi can only appear on the elements schema and phase.
It is an ISO Formal Public Identifier.</report>
<report test="@icon and not(self::sch:schema or self::sch:report or
self::sch:diagnostic or self::sch:key or self::sch:p or self::sch:pattern
or self::sch:phase or self::sch:report )"
>The attribute icon can only appear on the elements schema, report, diagnostic,
key, p, pattern, phase and report. It is the URL
of a small image. </report>
<report test="@id and not(self::sch:schema or self::sch:report or
self::sch:p or self::sch:pattern or self::sch:phase or
self::sch:report or self::sch:rule or self::sch:diagnostic)"
>The attribute id can only appear on the elements schema, report, p, pattern, phase,
report, rule and diagnostic. It is a name, it should
not start with a number or symbol.</report>
<report test="@name and not(self::sch:key or self::sch:pattern)"
>The attribute name can only appear on the elements pattern and key.</report>
<report test="@path and not(self::sch:key | self::sch:name)"
>The attribute path can only appear on the element key. It is an XPath path.</report>
<report test="@pattern and not(self::sch:active)"
>The attribute pattern can only appear on the element active. It gives the id of a pattern
that should be activated in that phase.</report>
<report test="@prefix and not(self::sch:ns)"
>The attribute prefix can only appear on the element ns.</report>
<report test="@role and not(self::sch:report or self::sch:report or self::sch:rule)"
>The attribute role can only appear on the element report, report or rule. It is a simple name,
not a phrase.</report>
<report test="@rule and not(self::sch:extends)"
>The attribute rule can only appear on the element extends. It is the id of an abstract rule
declared elsewhere in the schema.</report>
<report test="@see and not(self::sch:pattern)"
>The attribute see can only appear on the element pattern. It is the URL of some
documentation for the schema.</report>
<report test="@select and not(self::sch:value-of)"
>The attribute select can only appear on the element value-of, with the same meaning
as in XSLT. It is an XSLT pattern.</report>
<report test="@schemaVersion and not(self::sch:schema)"
>The attribute schemaVersion can only appear on the element schema.
It gives the version of the schema.</report>
<report test="@subject and not(self::sch:report or self::sch:report)"
>The attribute subject can only appear on the elements report and report.
It is an XSLT pattern.
</report>
<report test="@test and not(self::sch:assert or self::sch:report)"
>The attribute test can only appear on the elements report and report.
It is an XPath expression with the XSLT additional functions.</report>
<report test="@uri and not(self::sch:ns)"
>The attribute uri can only appear on the element ns. It is a URI.</report>
<report test="@value and not(self::sch:dir)"
>The attribute value can only appear on the element dir. It sets the directionality of text: 'rtl' is
right-to-left and 'ltr' is left-to-right.</report>
<report test="@version and not(self::sch:schema)"
>The attribute version can only appear on the element schema.
It gives the version of Schematron required as major number "."
minor number.</report>
<assert test="not(attribute::*) or attribute::*[string-length(normalize-space(text()))=0]"
>Every attribute on a Schematron element must have a value if it is specified.</assert>
</rule>
</pattern>
<diagnostics>
<diagnostic id="spelling"
>Check this is not a spelling error. The recognized element names are
schema, title, ns, pattern, rule, key, assert, report, diagnostics, diagnostic,
name, value-of, emph and dir.</diagnostic>
<diagnostic id="bad-parent"
>The element appeared inside a <value-of select="name(parent::*)"/>.</diagnostic>
</diagnostics>
</schema>
Appendix C: W3C XML Schema for Schematron 1.5
This is non-normative
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema
targetNamespace="http://www.ascc.net/xml/schematron"
xmlns:sch="http://www.ascc.net/xml/schematron"
xmlns="http://www.ascc.net/xml/schematron"
xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
version="+//IDN sinica.edu.tw//SGML W3C XML Schema for Schematron 1.5//EN">
<xsd:annotation>
<xsd:documentation source="
http://www.ascc.net/xml/resource/schematron/schematron.html"
xml:lang="en"/>
</xsd:annotation>
<xsd:element name="active">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="sch:dir"/>
<xsd:element ref="sch:emph"/>
<xsd:element ref="sch:span"/>
</xsd:choice>
<xsd:attribute name="pattern" type="xsd:IDREF" use="required"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="assert">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="sch:name"/>
<xsd:element ref="sch:emph"/>
<xsd:element ref="sch:dir"/>
<xsd:element ref="sch:span"/>
<xsd:any namespace="##other" processContents="lax"/>
</xsd:choice>
<xsd:attribute name="test" type="xsd:string" use="required"/>
<xsd:attribute name="role" type="xsd:NMTOKEN"/>
<xsd:attribute name="id" type="xsd:ID"/>
<xsd:attribute name="diagnostics" type="xsd:IDREFS"/>
<xsd:attribute name="icon" type="xsd:uriReference"/>
<xsd:attribute name="subject" type="xsd:string"
use="default" value="."/>
<xsd:anyAttribute namespace="##other" processContents="lax"/>
<xsd:attribute name="xml:lang" type="xsd:language" >
</xsd:complexType>
</xsd:element>
<xsd:element name="diagnostic">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="sch:value-of"/>
<xsd:element ref="sch:emph"/>
<xsd:element ref="sch:dir"/>
<xsd:element ref="sch:span"/>
<xsd:any namespace="##other" processContents="lax"/>
</xsd:choice>
<xsd:attribute name="id" type="xsd:ID" use="required"/>
<xsd:attribute name="icon" type="xsd:uriReference"/>
<xsd:anyAttribute namespace="##other" processContents="lax"/>
<xsd:attribute name="xml:lang" type="xsd:language" >
</xsd:complexType>
</xsd:element>
<xsd:element name="diagnostics">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="diagnostic" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="dir">
<xsd:complexType>
<xsd:simpleContent>
<xsd:restriction base="xsd:string">
<xsd:attribute name="value">
<xsd:simpleType>
<xsd:restriction base="xsd:NMTOKEN">
<xsd:enumeration value="ltr"/>
<xsd:enumeration value="rtl"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:restriction>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<xsd:element name="emph" type="xsd:string"/>
<xsd:element name="extends">
<xsd:complexType>
<xsd:attribute name="rule" type="xsd:IDREF"
use="required"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="key">
<xsd:complexType>
<xsd:attribute name="name" type="xsd:NMTOKEN"
use="required"/>
<xsd:attribute name="path" type="xsd:string"
use="required"/>
<xsd:attribute name="icon" type="xsd:uriReference"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="name">
<xsd:complexType>
<xsd:attribute name="path" type="xsd:string"
use="default" value="."/>
</xsd:complexType>
</xsd:element>
<xsd:element name="ns">
<xsd:complexType>
<xsd:attribute name="uri" type="xsd:uriReference"
use="required"/>
<xsd:attribute name="prefix" type="xsd:NCName"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="p">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="sch:dir"/>
<xsd:element ref="sch:emph"/>
<xsd:element ref="sch:span"/>
</xsd:choice>
<xsd:attribute name="id" type="xsd:ID"/>
<xsd:attribute name="class" type="xsd:string"/>
<xsd:attribute name="icon" type="xsd:uriReference"/>
<xsd:anyAttribute namespace="##other" processContents="lax"/>
<xsd:attribute name="xml:lang" type="xsd:language" >
</xsd:complexType>
</xsd:element>
<xsd:element name="pattern">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="p" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="sch:rule" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="name" type="xsd:string"
use="required"/>
<xsd:attribute name="see" type="xsd:uriReference"/>
<xsd:attribute name="id" type="xsd:ID"/>
<xsd:attribute name="icon" type="xsd:uriReference"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="phase">
<xsd:complexType>
<xsd:sequence >
<xsd:element ref="sch:p" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="sch:active" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:ID" use="required"/>
<xsd:attribute name="fpi" type="xsd:string"/>
<xsd:attribute name="icon" type="xsd:uriReference"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="report">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="sch:name"/>
<xsd:element ref="sch:emph"/>
<xsd:element ref="sch:dir"/>
<xsd:element ref="sch:span"/>
<xsd:any namespace="##other"
processContents="lax"/>
</xsd:choice>
<xsd:attribute name="test" type="xsd:string"
use="required"/>
<xsd:attribute name="role" type="xsd:NMTOKEN"/>
<xsd:attribute name="id" type="xsd:ID"/>
<xsd:attribute name="diagnostics" type="xsd:IDREFS"/>
<xsd:attribute name="icon" type="xsd:uriReference"/>
<xsd:attribute name="subject" type="xsd:string"
use="default" value="."/>
<xsd:attribute name="xml:lang" type="xsd:language" >
</xsd:complexType>
</xsd:element>
<xsd:element name="rule">
<xsd:complexType>
<xsd:choice maxOccurs="unbounded">
<xsd:element ref="sch:assert"/>
<xsd:element ref="sch:report"/>
<xsd:element ref="sch:key"/>
<xsd:element ref="sch:extends"/>
</xsd:choice>
<xsd:attribute name="context" type="xsd:string"/>
<xsd:attribute name="abstract" type="xsd:boolean"
use="default" value="false"/>
<xsd:attribute name="role" type="xsd:NMTOKEN"/>
<xsd:attribute name="id" type="xsd:ID"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="schema">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="sch:title" minOccurs="0"/>
<xsd:element ref="sch:ns" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="sch:p" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="sch:phase" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="sch:pattern" maxOccurs="unbounded"/>
<xsd:element ref="sch:p" minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="sch:diagnostics" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:ID"/>
<xsd:attribute name="fpi" type="xsd:string"/>
<xsd:attribute name="schemaVersion" type="xsd:string"/>
<xsd:attribute name="defaultPhase" type="xsd:IDREF"/>
<xsd:attribute name="icon" type="xsd:uriReference"/>
<xsd:attribute name="version" type="xsd:string"
use="default"
value="1.5"/>
<xsd:anyAttribute namespace="##other"
processContents="lax"/>
<xsd:attribute name="xml:lang" type="xsd:language" >
</xsd:complexType>
</xsd:element>
<xsd:element name="span">
<xsd:complexType>
<xsd:simpleContent>
<xsd:restriction base="xsd:string">
<xsd:attribute name="class"
type="xsd:string"/>
</xsd:restriction>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<xsd:element name="title">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0"
maxOccurs="unbounded">
<xsd:element ref="sch:dir"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
<xsd:element name="value-of">
<xsd:complexType>
<xsd:attribute name="select" type="xsd:string"
use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Appendix D: EBNF Productions for Paths and Expressions
These have been abstracted from the relevant W3C Recommendations, which should be treated
as the normative sourse. Schematron implementations should track the most recent W3C
specifications.
AbbreviatedAbsoluteLocationPath
::= '//' RelativeLocationPath
AbbreviatedRelativeLocationPath
::= RelativeLocationPath '//' Step
AbbreviatedStep
::= '.' | '..'
AbbreviatedAxisSpecifier
::= '@'?
AbsoluteLocationPath
::= '/' RelativeLocationPath? | AbbreviatedAbsoluteLocationPath
AdditiveExpr
::= MultiplicativeExpr
| AdditiveExpr '+' MultiplicativeExpr
| AdditiveExpr '-' MultiplicativeExpr
AndExpr
::= EqualityExpr | AndExpr 'and' EqualityExpr
Argument
::= Expr
AxisSpecifier
::= AxisName '::' | AbbreviatedAxisSpecifier
AxisName
::= 'ancestor' | 'ancestor-or-self' | 'attribute'
| 'child' | 'descendant' | 'descendant-or-self'
| 'following' | 'following-sibling' | 'namespace'
| 'parent' | 'preceding' | 'preceding-sibling' | 'self'
ChildOrAttributeAxisSpecifier
::= AbbreviatedAxisSpecifier | ('child' | 'attribute') '::'
Digits
::= +
EqualityExpr
::= RelationalExpr
| EqualityExpr '=' RelationalExpr
| EqualityExpr '!=' RelationalExpr
Expr
::= OrExpr
ExprToken
::= '(' | ')' | '' | '.' | '..' | '@' | ',' | '::'
| NameTest | NodeType | Operator | FunctionName | AxisName
| Literal | Number | VariableReference
FilterExpr
::= PrimaryExpr | FilterExpr Predicate
FunctionCall
::= FunctionName '(' ( Argument ( ',' Argument )* )? ')'
FunctionName
::= 'last' | 'position' | 'ount' | 'id' | 'local-name'
| 'namespace-uri' | 'name' | 'string' | 'concat' | 'starts-with'
| 'contains' | 'substring-before' | 'substring-after' | 'substring'
| 'string-length' | 'normalize-space' | 'translate' | 'boolean'
| 'not' | 'true' | 'fals' | 'lang' | 'number' | 'sum'
| 'floor' | 'ceiling' | 'round' | 'document' | 'key'
| 'format-number' | 'current'
| 'system-property' (Caution: system-roperty() available with XSLT 1.1 only)
IdKeyPattern
::= 'id' '(' Literal ')' | 'key' '(' Literal ',' Literal ')'
LocalPart
::= NCName
LocationPath
::= RelativeLocationPath | AbsoluteLocationPath
LocationPathPattern
::= '/' RelativePathPattern? | '//'? RelativePathPattern
Literal
::= '"' * "'"
MultiplicativeExpr
::= UnaryExpr
| MultiplicativeExpr MultiplyOperator UnaryExpr
| MultiplicativeExpr 'div' UnaryExpr
| MultiplicativeExpr 'mod' UnaryExpr
MultiplyOperator
::= '*'
NameTest
::= '*' | NCName ':' '*' | QName
NCName
::= (Letter | '_') (NCNameChar)*
NCNameChar
::= Letter | Digit | '.' | '-' | '_' | CombiningChar | Extender
NodeTest
::= NameTest | NodeType '(' ')'
| 'processing-instruction' '(' Literal ')'
NodeType
::= 'comment' | 'text' | 'processing-instruction' | 'node'
Number
::= Digits ('.' Digits?)? | '.' Digits
Operator
::= OperatorName | MultiplyOperator
| '/' | '//' | '|' | '+' | '-' | '=' | '!=' | '<' | '<=' | '>' | '>='
OperatorName
::= 'and' | 'or' | 'mod' | 'div'
PathExpr
::= LocationPath | FilterExpr
| FilterExpr '/' RelativeLocationPath
| FilterExpr '//' RelativeLocationPath
Pattern
::= LocationPathPattern | Pattern '|' LocationPathPattern
| IdKeyPattern (('/' | '//') RelativePathPattern)?
Predicate
::= ''
PredicateExpr
::= Expr
Prefix
::= NCName
PrimaryExpr
::= VariableReference | '(' Expr ')' | Literal
| Number | FunctionCall
QName
::= (Prefix ':')? LocalPart
RelativeLocationPath
::= Step | RelativeLocationPath '/' Step
| AbbreviatedRelativeLocationPath
RelativePathPattern
::= StepPattern
| RelativePathPattern '/' StepPattern
| RelativePathPattern '//' StepPattern
RelationalExpr
::= AdditiveExpr
| RelationalExpr '<' AdditiveExpr
| RelationalExpr '>' AdditiveExpr
| RelationalExpr '<=' AdditiveExpr
| RelationalExpr '>=' AdditiveExpr
S
::= (#x20 | #x9 | #xD | #xA)+
Step
::= AxisSpecifier NodeTest Predicate* | AbbreviatedStep
StepPattern
::= ChildOrAttributeAxisSpecifier NodeTest Predicate*
UnaryExpr
::= UnionExpr | '-' UnaryExpr
UnionExpr
::= PathExpr | UnionExpr '|' PathExpr
VariableReference
::= '$' QName
Appendix F: Reference Implementation for Schematron 1.3
Following is a reference implementation for an earlier version of Schematron. It shows
how simple a basic implementation can be. For reference and other implementations of
Schematron 1.5, visit the website http://www.ascc.net/xml/schematron
<?xml version="1.0" ?>
<!-- Preprocessor for the Schematron XML Schema Language.
http://www.ascc.net/xml/resource/schematron/schematron.html
Copyright (c) 1999, 2000 Rick Jelliffe and Academia Sinica Computing Center, Taiwan
This software is provided 'as-is', without any express or implied warranty.
In no event will the authors be held liable for any damages arising from
the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it freely,
subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not claim
that you wrote the original software. If you use this software in a product,
an acknowledgment in the product documentation would be appreciated but is
not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
History:
1999-10-18 Created RJ
1999-10-25 In report and assert should use apply-template not value-of
Thanks to James Clark for this fix
1999-11-2 Add key element
1999-12-21 Add ns element: thanks Dave Carlisle for the code
2000-03-26 Add axsl:output and version- well spotted Oliver Becker
2000-10-20 Add select to do-all-patterns: thanks Uche Obbuji
-->
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:axsl="http://www.w3.org/1999/XSL/TransformAlias">
<xsl:namespace-alias stylesheet-prefix="axsl" result-prefix="xsl"/>
<!-- Category: top-level-element -->
<xsl:output
method="xml"
omit-xml-declaration="no"
standalone="yes"
indent="yes" />
<xsl:template match="schema">
<axsl:stylesheet version="1.0">
<axsl:output method="text" />
<xsl:for-each select="ns">
<xsl:attribute
name="{concat(@prefix,':dummy-for-xmlns')}"
namespace="{@uri}"/>
</xsl:for-each>
<xsl:attribute name="version">1.0</xsl:attribute>
<xsl:apply-templates mode="do-keys" />
<axsl:template match='/'>
<xsl:value-of select="title" />
<xsl:apply-templates mode="do-all-patterns" />
</axsl:template>
<xsl:apply-templates />
<axsl:template match="text()" priority="-1">
<!-- strip characters -->
</axsl:template>
</axsl:stylesheet>
</xsl:template>
<xsl:template match="pattern" mode="do-all-patterns" >
<axsl:apply-templates select="/" mode='M{count(preceding-sibling::*)}' />
</xsl:template>
<xsl:template match="pattern">
<xsl:apply-templates />
<axsl:template match="text()" priority="-1"
mode="M{count(preceding-sibling::*)}">
<!-- strip characters -->
</axsl:template>
</xsl:template>
<xsl:template match="rule">
<axsl:template match='{@context}' priority='{4000 - count(preceding-sibling::*)}'
mode='M{count(../preceding-sibling::*)}'>
<xsl:apply-templates />
<axsl:apply-templates mode='M{count(../preceding-sibling::*)}'/>
</axsl:template>
</xsl:template>
<xsl:template match="name" mode="text">
<xsl:choose>
<xsl:when test='@path' >
<axsl:value-of select="name({@path})" />
</xsl:when>
<xsl:otherwise>
<axsl:value-of select="name(.)" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="assert">
<axsl:choose>
<axsl:when test='{@test}'/>
<axsl:otherwise>
<xsl:if test="@role">
(<xsl:value-of select="@role"/>) </xsl:if>
In pattern <xsl:value-of select="ancestor::pattern/@name"/>:
<xsl:apply-templates mode="text" />
</axsl:otherwise>
</axsl:choose>
</xsl:template>
<xsl:template match="report">
<axsl:if test='{@test}'>
<xsl:if test="@role">
(<xsl:value-of select="@role"/>) </xsl:if>
In pattern <xsl:value-of select="ancestor::pattern/@name"/>:
<xsl:apply-templates mode="text"/>
</axsl:if>
</xsl:template>
<xsl:template match="rule/key" mode="do-keys">
<axsl:key match="{../@context}" name="@name"
path="@use" />
</xsl:template>
<xsl:template match="text()" priority="-1"
mode="do-keys" >
<!-- strip characters -->
</xsl:template>
<xsl:template match="text()" priority="-1"
mode="do-all-patterns">
<!-- strip characters -->
</xsl:template>
<xsl:template match="text()" priority="-1">
<!-- strip characters -->
</xsl:template>
</xsl:stylesheet>
Appendix G: Notice of Intended Upgrades for ISO Schematron
Schematron is being standardized as part of the International Organization for
Standardization (ISO) international standard DSDL. It will be known as
ISO/IEC 19757 - DSDL
Document Schema Definition Languages
Part 3 Rule-based validation - Schematron
Schematron 1.5 will be the basis for this. It is expected that an existing Schematron 1.5
schema will run unchanged with ISO Schematron.
This appendix indicates the expected alterations, most of which have been announced
previously or implemented in prototypes already.
1. Assertions allow value-of
Schematron 1.5 did not allow value-of in assertions. This was to enforce a
distinction between diagnostics and assertions (which are intended to make positive
statements of expectation.) Many users requested this change.
2. More flexibility with key
Schematron 1.5 only allowed the key element as part of rules. ISO Schematron
will also allow key under the schema at the same position as
phase elements. This was implemented in the ZVON Schematron, and users reported
finding it useful.
3. Schematron as a Framework
The ISO Schematron standard will position Schematron as a framework (the elements) which
potentially allows different query languages. This will probably be done by adding to
the schema element an attribute
use NMTOKEN "XSLT"
which allows the query/expression language to be stated. Anticipated values are
- XSLT
- XSLT 1.n, as currently used, this is the default
- EXSLT
- XSLT 1.n with the EXSLT extensions
- XPATH
- This is for implementations just using a simple XPath library. The element
key would not be available.
- XPATH2
- The schema uses the mooted XPath2 spec.
- XSLT2
- The schema uses the mooted XSLT spec.
- XQUERY
- The schema uses the mooted XQuery spec.
This helps resolve or clarify a couple of issues: first, that some implementers have just
used an XPath library; second, that we have to cope with different versions of XPath
notably XPath2; third that there has been implementation experience using non-XPath
query languages (Schemarama from Becket and Miller); and fourth to clarify that the
Schematron idea is not just using XPaths but the particular configuration of assertions
into rules into patterns into phases.
I expect there will be other schema languages which just add an assertion element to an
element or attribute declaration (e.g. Eric van der Vlist's Examplotron), but this
(though useful) is not Schematron: the key idea of Schematron is the pattern—an
abstract structure which is expressed in terms of an element (the context) but may not
actually have anything to do with that element.
This recasts Schematron as a general rule framework.
4. Variable statement let
Schematron 1.5 is cumbersome when expressing "datatype" kinds of constraints.
It is powerful enough to parse a string into components, but frequently a string must
be reparsed several times causing very verbose and error-prone expressions.
ISO Schematron will include a let statement that allows binding of variables
within the scope of a rule. The variable value will be available using a $
delimiter, and can be implemented using XSLT variables. Presumably it would only be
available when using XSLT or EXLST as the query language.
This feature is adopted from XCSL (XML Constraint Specification Language), with the kind blessing of XCSL's
developer José Carlos Leite Ramalho
5. Abstract Patterns
Schematron was invented to able to declare and detect abstract patterns. This allows a
document type to be declared in terms of rhetorical structures rather than physical
structures. For example, to say this is a table and the row element name is
tr or every paragraph has a heading to which it relates.
This would again be implemented using XLST variables. So we could say (the syntax is not
fixed yet)
<pattern isa="table">
<param formal="row" actual="tr"/>
<param formal="cell" actual="td"/>
</pattern>
<pattern abstract="true" name="table">
<rule context="$row">
<assert test="$cell"
>A <name/> should have at least one cell</assert>
</rule>
</pattern>
So let statements allow clearer expressions in the test values, while
abstract patterns allow clearer expression with fewer elements.
Also, with abstract patterns, it then becomes possible to do document to document
mappings, because we can identify structures of related information items independently
of their serialization and naming conventions. The role attribute can be used
for this.
Acknowledgements
The Schematron was developed as a free software project at the
Academia Sinica Computing Centre in 1999 and 2000 by the author. I thank the Director, Dr.
Simon Lin, for his encouragement and support. Also I owe thanks for the support and
contributions of Professor C.C. Hsieh, Dr Makoto Murata, Dr Oliver Becker (architecture), Dr
Miloslav Nic (tutorials), Dr David Carlisle, Mr James Clark, Mr Adrian Edwards, Mr Uche
Ogbuji, Mr Francis Norton, Mr David Pawson, Mr Eddie Robertsson, Dr. José Carlos
Leite Ramalho, Dr. Dave Becket, Mr Ludvig Svenovius (extends) and the members of the
Schematron mail list. Other work was performed with sponsorship from GeoTempo Inc., Taipei,
Allette Systems, Pty. Ltd. Sydney, and Topologi Pty. Ltd. Sydney.
This specification is a
much updated version of a paper delivered at the Pacific Neighbourhood
Consortium/Electronic Cultural Atlas Initiative/Electronic Buddhist Text Initiative
joint conference, University of California, Berkeley, Feb. 2000.
References
[1] Private conversation by the author with a Taiwanese MIS professor.
[2] The Schematron project website is at
http://www.ascc.net/xml/resource/schematron/schematron.html
A new website to
encourage open source contributions is being established for full operation 1Q/20001 at
http://sourceforge.net/projects/schematron
[CIP] F.L. Bauer, M. Broy, B. Moller, P. Pepper, M. Wirsing, et al. The Munich
Project CIP. Vol. I: The Wide Spectrum Language CIP-L, volume I of Lecture Notes on
Computer Science. Springer Verlag, Berlin, Heidelberg, New York, Berlin, 1985.
[deFrancis] The Chinese Language
[Etessami] K. Etessami and M. Yannakakis,
From Rule-based to Automata-based testing, Proceedings of FORTE/PSTV'2000, 20th IFIP
Int. Conf. on Formal Description Techniques/Protocol Specification, Testing, and
Verification, 2000, http://citeseer.nj.nec.com/410961.html
[Finkelstein]
[HTML]
[LeCarlier]Baudouin Le Charlier and Pierre Flener,
Specifications Are Necessarily Informal or: Some More Myths of Formal Methods,
The Journal of Systems and Software Vol. 40", No."3",
March, 1998 citeseer.nj.nec.com/190011.html
[Namespace]
[SGML]
[Schemarama]
[SOX]
[RDF]
[RELAX] Murata M.
[URL]
[XLink]
[XML]
[XML Schema]
[XSL-FO]
[XSLT]
[WASH]
http://www.cs.washington.edu/research/jair/volume2/minton94a-html/node3.html