Useful Schematron checks for DITA authoring

Post here questions and problems related to editing and publishing DITA content.
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Useful Schematron checks for DITA authoring

Post by chrispitude »

Hi all,

I wanted to start a discussion thread where we can share useful Schematron checks for DITA authoring.

Here's a check that reports text elements that begin/end with spaces:

Code: Select all

  <rule context="p|ph|codeph|filename|indexterm|xref|user-defined|user-input" role="warning">
    <let name="firstNodeIsElement" value="node()[1] instance of element()"/>
    <let name="lastNodeIsElement" value="node()[last()] instance of element()"/>
    <report test="(not($firstNodeIsElement) and matches(.,'^\s',';j')) or (not($lastNodeIsElement) and matches(.,'\s$',';j'))">Textual elements should not begin or end with whitespace.</report>
  </rule>
You can set the context to the list of elements to check. This check has some extra machinery to avoid false reporting of elements like this:

Code: Select all

<p><xref ...> provides more information on this.</p>
where the <xref> resolves as empty and the space after the <xref> causes it to report.
Radu
Posts: 8991
Joined: Fri Jul 09, 2004 5:18 pm

Re: Useful Schematron checks for DITA authoring

Post by Radu »

Hi Chris,

Adding a reference to the Schematron rules we use in the Oxygen User's Guide:

https://github.com/oxygenxml/userguide/ ... DITA/rules

Most of our Schematron rules are here:

https://github.com/oxygenxml/userguide/ ... vanced.sch

Besides Schematron rules we also have a style guide, a separate DITA Map with topics in which our technical documentation writers write rules about how to write the documentation.
An older blog post describes some of the rules:

https://blog.oxygenxml.com/topics/SchematronBCs.html

Also George Bina created this project:

https://github.com/oxygenxml/dim

which has some sample rules contributed by Comtech and attempts to produce the Schematron rules using XSLT from DITA topics which describe them.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Useful Schematron checks for DITA authoring

Post by chrispitude »

Here is a check that reminds writers to populate cross-book links with reference text:

Code: Select all

<pattern id="refs">
  <rule context="link|xref" role="error">
    <report test="not(node()) and contains(@keyref, '.')">Empty cross-book reference; please add the target text.</report>
  </rule>
</pattern>
Radu
Posts: 8991
Joined: Fri Jul 09, 2004 5:18 pm

Re: Useful Schematron checks for DITA authoring

Post by Radu »

Hi,

It's a good rule if you use key scopes only for cross publication references. If you use them for internal links then it would show errors even if the processor is able to come up with by itself with titles for links.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Useful Schematron checks for DITA authoring

Post by chrispitude »

If you keep all your DITA content inside a dita/ directory, here is a check for any @href file references that references a file above the dita/ directory level:

Code: Select all

<!-- compute how many directory levels exist past '/dita/' (or -1 if it doesn't exist) -->
<let name="this_file_depth" value="if (contains(base-uri(), '/dita/')) then (count(tokenize(substring-after(base-uri(), '/dita/'), '/'))-1) else (-1)"/>

<!-- make sure there aren't more '../' than our directory levels past '/dita/' -->
<pattern id="check_depth">
  <rule context="*[contains(@href, '../') and not(@scope = 'external')]" role="error">
    <report test="($this_file_depth >= 0) and ((count(tokenize(@href, '\.\./'))-1) > $this_file_depth)">@href refers to a file outside the current dita/ directory.</report>
  </rule>
</pattern>
Radu
Posts: 8991
Joined: Fri Jul 09, 2004 5:18 pm

Re: Useful Schematron checks for DITA authoring

Post by Radu »

Hi Chris,

Thanks for posting your custom Schematron check, maybe others will find it useful.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Useful Schematron checks for DITA authoring

Post by chrispitude »

Some of our writers kept their Oxygen project directory in their Microsoft OneDrive folder (Windows only). Unfortunately, OneDrive's aggressive filesystem locking prevents Oxygen's Git plugin from working correctly.

To resolve this, I added the following check to our map-level and topic-level Schematron files:

Code: Select all

<pattern id="onedrive_check">
 <rule context="/" role="error">
  <report test="contains(base-uri(), 'OneDrive')">Oxygen projects using Git should not be placed in OneDrive folders.</report>
 </rule>
</pattern>
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Useful Schematron checks for DITA authoring

Post by chrispitude »

One of our writers got confused in the <mapref> configuration window and created a <mapref> that does not reference anything (no @keyref or @href):

Code: Select all

<mapref format="ditamap" keyscope="another_book" navtitle="another_book"
      processing-role="resource-only" scope="peer"/>
I added the following check to catch this:

Code: Select all

<rule context="mapref" role="error">
  <report test="not(@href or @keyref) and (not(@navtitle) or @processing-role='resource-only')">'mapref' does not reference any content or provide a @navtitle.</report>
</rule>
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Useful Schematron checks for DITA authoring

Post by chrispitude »

We found that when the following CSS property is applied to make long words breakable in table cells:

Code: Select all

.table { word-break: break-word; }
and a table has a mix of fixed-value and proportional-allocation width values:

Code: Select all

<tgroup cols="3">
  <colspec colname="c1" colnum="1" colwidth="2in"/>
  <colspec colname="c2" colnum="2" colwidth="1*"/>
  <colspec colname="c3" colnum="3" colwidth="2*"/>
then the proportional allocations greedily consume the width, causing the fixed-width column to be compressed in the HTML5 and WebHelp outputs:

image.png
image.png (22.59 KiB) Viewed 1324 times

(This seems like a bug to me to ignore a fixed-width spacing, but Firefox, Chrome, and Edge all do it.)

To help writers avoid this condition, we added the following Schematron check:

Code: Select all

<pattern id="tables_error">
  <rule context="tgroup" role="error">
    <report test="colspec/@colwidth[contains(., '*')] and colspec/@colwidth[matches(., '\d') and not(contains(., '*'))]">Do not mix proportional ("*") widths with fixed widths; use blank values for automatically allocated widths.</report>
  </rule>
</pattern>
Here is a testcase for the browser behavior (but not with the Schematron check), if anyone wants to experiment with it:

html5_table_column_widths.zip
(3.42 KiB) Downloaded 215 times
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Useful Schematron checks for DITA authoring

Post by chrispitude »

I noticed today that my writers sometimes made typographical errors in their units in their @colwidth (table column width) values, such as "i" instead of "in". Here is a Schematron check to report invalid units:

Code: Select all

<pattern id="tables_error">
  <rule context="tgroup" role="error">
    <report test="colspec/@colwidth[matches(., '\d(?!\d|\.|cm|em|in|mm|pi|pt|px|[*%\s]|$)', ';j')]">Valid column width units are blank (automatic), * (proportional), in, pt, px, em, cm, mm, and pi.</report>
  </rule>
</pattern>
Although the CALS table specification lists only certain units as supported, PDF Chemistry and WebHelp (browsers) support some additional CSS-based units that I permitted as valid in the check.
Post Reply