Schematron Rule for Trailing Spaces in Elements

Questions about XML that are not covered by the other forums should go here.
zbinder
Posts: 2
Joined: Wed Jan 05, 2022 5:54 pm

Schematron Rule for Trailing Spaces in Elements

Post by zbinder »

I created a schematron rule to attempt to report on any element that begins or ends with a space character. I initially tried limiting the context to text() as follows, but that solution reported on elements that contained a child element as the first or last child as well as those that start or end with a space character.

<sch:pattern id="LeadingOrTrailingSpace">
<sch:rule context="text()" role="error">
<sch:report test="starts-with(., ' ')">Elements should not begin with a leading space.</sch:report>
<sch:report test="ends-with(., ' ')">Elements should not end with a trailing space.</sch:report>
</sch:rule>
</sch:pattern>

I then changed the pattern to have a context of all elements and that successfully works for leading space reporting.

<sch:pattern id="LeadingOrTrailingSpace">
<sch:rule context="*[(@class, ' . ')]" role="error">
<sch:report test="(not(node()[1] instance of element())) and starts-with(., ' ')">Elements should not begin with a leading space.</sch:report>
<sch:report test="(not(descendant::node()[last()] instance of element()) and ends-with(., ' '))">Elements should not end with a trailing space.</sch:report>
</sch:rule>
</sch:pattern>

However, trailing space reporting is still problematic. For some reason, elements such as <conbody> and <ul> are being reported upon as having a trailing space. It appears as though schematron is finding the spaces between <conbody> and <p> and <ul> and <li>. Here's the sample topic I'm using to test:

<concept id="leadingTrailingSpace">
<title>Leading Space Trailing Space</title>
<shortdesc>Test of schematron rules for leading and trailing space identification.</shortdesc>
<conbody>
<p> This paragraph starts with a leading space and is handled correctly.</p>
<p>This paragraph ends with a trialing space and is handled correctly. </p>
<p><ph>This paragraph starts with a phrase element</ph> and continues with plain text and is handled correctly.</p>
<p>This paragraph starts with plain text and <ph>ends with a phrase element and is handled correctly.</ph></p>
<p>This paragraph does not begin or end with a space and is handled correctly.</p>
<ul>
<li>List Item #1</li>
<li>List Item #2</li>
<li>List Item #3</li>
</ul>
</conbody>
</concept>

Does anyone have any ideas of how I can solve the issue with trailing space reporting on non-textual parent elements?

Thanks so much!

Zak Binder
zak.binder@ukg.com
Zak BInder
Director of Knowledge Engineering
zak.binder@proliant.com
tavy
Posts: 365
Joined: Thu Jul 01, 2004 12:29 pm

Re: Schematron Rule for Trailing Spaces in Elements

Post by tavy »

Hello Zak,

There is an sample rule in our user manual that checks for spaces at the beginning and end of elements. You can find the rule here:
http://userguide.sync.ro/editor-sa/topi ... pk_g3d_34b

Code: Select all

<sch:rule context="p|ph|codeph|filename|indexterm|xref|user-defined|user-input">
    <sch:let name="firstNodeIsElement" value="node()[1] instance of element()"/>
    <sch:let name="lastNodeIsElement" value="node()[last()] instance of element()"/>
    <sch:report test="(not($firstNodeIsElement) and matches(.,'^\s',';j')) 
        or (not($lastNodeIsElement) and matches(.,'\s$',';j'))"
        role="warning">
        Textual elements should not begin or end with whitespace.</sch:report>
</sch:rule>
Best Regards,
Octavian
Octavian Nadolu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Schematron Rule for Trailing Spaces in Elements

Post by chrispitude »

Hi Zak,

We have a similar Schematron whitespace rule. We hardcode the list of elements in the rule context, just as Octavian's example does.

We have ours set to exclude elements with profiling conditions, as shown by the @props/@product/@audience exclusion in the context here::

Code: Select all

  <rule context="(author|cite|codeph|command|default|draft-comment|emphasis|filename|fn|indexterm|infotip|keyword|message|p|ph|project-label|sub|sup|term|title|uicontrol|user-defined|user-input|variable|xref)[not(@props or @product or @audience)]" role="warning">
    <let name="firstNodeIsElement" value="node()[1] instance of element()"/>
    <let name="lastNodeIsElement" value="node()[last()] instance of element()"/>
    <report test="((not($firstNodeIsElement) and matches(., '^\s', ';j')) or
                   (not($lastNodeIsElement) and matches(., '\s$', ';j')))">Textual elements should not begin or end with whitespace.</report>
  </rule>
because sometimes we have conditional elements with leading/trailing whitespace to make the text flow work out:

Code: Select all

<p>See the <cite>User Guide<ph product="A"> for Product A</ph></cite> for more information.</p>
Post Reply