Schematron and tokenize

This should cover W3C XML Schema, Relax NG and DTD related problems.
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

Schematron and tokenize

Post by david_himself »

Hi

I'm embedding Schematron rules in a schema for validating a TEI/XML personography. I have eight reports that constrain the attributes and values in the context of <relation>. Seven fire correctly and one never fires. For some rules I'm tokenizing the value of @mutual and looking at the individual tokens. The schema is an ODD file that produces Relax NG XML syntax.

This rule works, not looping through the tokens but relying on the fact that seq1 = seq2 is true if they have at least one element in common:

<sch:report test="@mutual and not(tokenize(@mutual) = concat('#',./ancestor::tei:person/@xml:id))" role="error">Value of @mutual must include <sch:value-of select="concat('#',./ancestor::tei:person/@xml:id)"/> as one of its targets.</sch:report>

This rule doesn't work, even though it seems to rely on the same point:

<sch:report test="@mutual and (tokenize(@mutual) = '\A[^#]\w+?\Z')" role="error">Targets in value of @mutual must all begin with '#'.</sch:report>

What's the difference? If my understanding of sequence equality is garbled and I do need to loop through the tokenized attribute, how should I approach it? Many thanks for any help.

David
Radu
Posts: 9048
Joined: Fri Jul 09, 2004 5:18 pm

Re: Schematron and tokenize

Post by Radu »

Hi David,

In the case where this does not work you seem to test the equality between a set of string literals and a regular expression. As far as I know the only way to check if a regular expression matches a string literal in XSLT is to use the "matches" XSL function, so I would not expect for the XSLT processor to apply the match automatically by detecting that '\A[^#]\w+?\Z' is a regular expression instead of a plain string literal.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

Re: Schematron and tokenize

Post by david_himself »

Thanks, Radu. I thought I'd tested with a non-regex string to eliminate the possibility of regex being the cause of failure, but perhaps I did that too hastily. I also tried fn:matches, but it cannot take a sequence as first argument.

Is there a simple way of testing that all space-delimited values of an attribute have '#' as first character in string?

best
David
Radu
Posts: 9048
Joined: Fri Jul 09, 2004 5:18 pm

Re: Schematron and tokenize

Post by Radu »

Hi David,

You could maybe replace in the attribute value all strings which start with "#" with the empty string, then normalize the rest and check its length:

Code: Select all

string-length(normalize-space(replace(@mutual, '#[^\s]*', '')))
I think the length should be 0 if all tokens started with "#".

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

Re: Schematron and tokenize

Post by david_himself »

Perfect! Thank you.

D
Post Reply