Page 1 of 1

Schematron: start element with a capital

Posted: Tue Apr 14, 2020 12:16 pm
by pieterjan_vdw
Hi,

I am writing a rule to check capitalization at the beginning of a sentence.
To do this, I used some of the code found here: post57735.html?hilit=schematron#p57587

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<sch:schema queryBinding="xslt2" xmlns:sch="http://purl.oclc.org/dsdl/schematron" xmlns:sqf="http://www.schematron-quickfix.com/validator/process">
	<sch:pattern abstract="true" id="starts-with-capital">
		<sch:rule context="$element" role="information">
			<sch:let name="firstNodeIsElement" value="node()[1] instance of element()"/>
			<sch:report test="(not($firstNodeIsElement) and (not(matches(text(), '^[A-Z|0-9]'))))">Start the element &lt;$element&gt; with a capital.</sch:report>
		</sch:rule>
	</sch:pattern>
	<sch:pattern is-a="starts-with-capital">
		<sch:param name="element" value="title"/>
	</sch:pattern>
	<sch:pattern is-a="starts-with-capital">
		<sch:param name="element" value="li"/>
	</sch:pattern>
</sch:schema>
This rule already works fine in most cases.
I only get an error message when I have for example something like in the third <li>:

Code: Select all

		<ul id="ul_zrm_wc1_jlb">
			<li>This is the first item in the list.</li>
			<li>Second item in the list.</li>
			<li>3<sup>rd</sup> item in the list.</li>
		</ul>
Then I get the following error message: A sequence of more than one item is not allowed as the first argument of fn:matches() ("3", " item in the list.")

How should I change my rule to avoid this?

Thank you.

Re: Schematron: start element with a capital

Posted: Tue Apr 14, 2020 2:41 pm
by tavy
Hello,

The problem is that the matches function allows only one node. If you specify "text()" as first argument, this means that all the text nodes from the current element will be matched.
You have two options:
1. You can pass all the text content from the current element, by changing the test something like this:

Code: Select all

(not($firstNodeIsElement) and (not(matches(., '^[A-Z|0-9]'))))
2. You can pass only the first text node from the current element, by changing the test something like this:

Code: Select all

(not($firstNodeIsElement) and (not(matches(text()[1], '^[A-Z|0-9]'))))
Best Regards,
Octavian