Schematron and tokenize

Post by **david_himself** » Tue Sep 29, 2020 11:11 pm

Hi

I'm embedding Schematron rules in a schema for validating a TEI/XML personography. I have eight reports that constrain the attributes and values in the context of <relation>. Seven fire correctly and one never fires. For some rules I'm tokenizing the value of @mutual and looking at the individual tokens. The schema is an ODD file that produces Relax NG XML syntax.

This rule works, not looping through the tokens but relying on the fact that seq1 = seq2 is true if they have at least one element in common:

<sch:report test="@mutual and not(tokenize(@mutual) = concat('#',./ancestor::tei:person/@xml:id))" role="error">Value of @mutual must include <sch:value-of select="concat('#',./ancestor::tei:person/@xml:id)"/> as one of its targets.</sch:report>

This rule doesn't work, even though it seems to rely on the same point:

<sch:report test="@mutual and (tokenize(@mutual) = '\A[^#]\w+?\Z')" role="error">Targets in value of @mutual must all begin with '#'.</sch:report>

What's the difference? If my understanding of sequence equality is garbled and I do need to loop through the tokenized attribute, how should I approach it? Many thanks for any help.

David

Post by **Radu** » Wed Sep 30, 2020 6:33 am

Hi David,

In the case where this does not work you seem to test the equality between a set of string literals and a regular expression. As far as I know the only way to check if a regular expression matches a string literal in XSLT is to use the "matches" XSL function, so I would not expect for the XSLT processor to apply the match automatically by detecting that '\A[^#]\w+?\Z' is a regular expression instead of a plain string literal.

Regards,
Radu

Post by **david_himself** » Wed Sep 30, 2020 7:29 am

Thanks, Radu. I thought I'd tested with a non-regex string to eliminate the possibility of regex being the cause of failure, but perhaps I did that too hastily. I also tried fn:matches, but it cannot take a sequence as first argument.

Is there a simple way of testing that all space-delimited values of an attribute have '#' as first character in string?

best
David

Post by **Radu** » Wed Sep 30, 2020 8:11 am

Hi David,

You could maybe replace in the attribute value all strings which start with "#" with the empty string, then normalize the rest and check its length:

Code: Select all

string-length(normalize-space(replace(@mutual, '#[^\s]*', '')))

I think the length should be 0 if all tokens started with "#".

Regards,
Radu

Post by **david_himself** » Wed Sep 30, 2020 8:19 am

Perfect! Thank you.

D

Schematron and tokenize

Schematron and tokenize

Re: Schematron and tokenize

Re: Schematron and tokenize

Re: Schematron and tokenize

Re: Schematron and tokenize