Useful Schematron checks for DITA authoring

Post by **chrispitude** » Wed Mar 18, 2020 10:11 pm

Hi all,

I wanted to start a discussion thread where we can share useful Schematron checks for DITA authoring.

Here's a check that reports text elements that begin/end with spaces:

Code: Select all

  <rule context="p|ph|codeph|filename|indexterm|xref|user-defined|user-input" role="warning">
    <let name="firstNodeIsElement" value="node()[1] instance of element()"/>
    <let name="lastNodeIsElement" value="node()[last()] instance of element()"/>
    <report test="(not($firstNodeIsElement) and matches(.,'^\s',';j')) or (not($lastNodeIsElement) and matches(.,'\s$',';j'))">Textual elements should not begin or end with whitespace.</report>
  </rule>

You can set the context to the list of elements to check. This check has some extra machinery to avoid false reporting of elements like this:

Code: Select all

<p><xref ...> provides more information on this.</p>

where the <xref> resolves as empty and the space after the <xref> causes it to report.

Post by **Radu** » Thu Mar 19, 2020 9:33 am

Hi Chris,

Adding a reference to the Schematron rules we use in the Oxygen User's Guide:

https://github.com/oxygenxml/userguide/ ... DITA/rules

Most of our Schematron rules are here:

https://github.com/oxygenxml/userguide/ ... vanced.sch

Besides Schematron rules we also have a style guide, a separate DITA Map with topics in which our technical documentation writers write rules about how to write the documentation.
An older blog post describes some of the rules:

https://blog.oxygenxml.com/topics/SchematronBCs.html

Also George Bina created this project:

https://github.com/oxygenxml/dim

which has some sample rules contributed by Comtech and attempts to produce the Schematron rules using XSLT from DITA topics which describe them.

Regards,
Radu

Post by **chrispitude** » Tue Mar 31, 2020 4:09 pm

Here is a check that reminds writers to populate cross-book links with reference text:

Code: Select all

<pattern id="refs">
  <rule context="link|xref" role="error">
    <report test="not(node()) and contains(@keyref, '.')">Empty cross-book reference; please add the target text.</report>
  </rule>
</pattern>

Post by **Radu** » Wed Apr 01, 2020 8:31 am

Hi,

It's a good rule if you use key scopes only for cross publication references. If you use them for internal links then it would show errors even if the processor is able to come up with by itself with titles for links.

Regards,
Radu

Post by **chrispitude** » Wed Feb 17, 2021 1:49 am

If you keep all your DITA content inside a dita/ directory, here is a check for any @href file references that references a file above the dita/ directory level:

Code: Select all

<!-- compute how many directory levels exist past '/dita/' (or -1 if it doesn't exist) -->
<let name="this_file_depth" value="if (contains(base-uri(), '/dita/')) then (count(tokenize(substring-after(base-uri(), '/dita/'), '/'))-1) else (-1)"/>

<!-- make sure there aren't more '../' than our directory levels past '/dita/' -->
<pattern id="check_depth">
  <rule context="*[contains(@href, '../') and not(@scope = 'external')]" role="error">
    <report test="($this_file_depth >= 0) and ((count(tokenize(@href, '\.\./'))-1) > $this_file_depth)">@href refers to a file outside the current dita/ directory.</report>
  </rule>
</pattern>

Post by **Radu** » Wed Feb 17, 2021 9:23 am

Hi Chris,

Thanks for posting your custom Schematron check, maybe others will find it useful.

Regards,
Radu

Post by **chrispitude** » Tue Jun 15, 2021 10:34 pm

Some of our writers kept their Oxygen project directory in their Microsoft OneDrive folder (Windows only). Unfortunately, OneDrive's aggressive filesystem locking prevents Oxygen's Git plugin from working correctly.

To resolve this, I added the following check to our map-level and topic-level Schematron files:

Code: Select all

<pattern id="onedrive_check">
 <rule context="/" role="error">
  <report test="contains(base-uri(), 'OneDrive')">Oxygen projects using Git should not be placed in OneDrive folders.</report>
 </rule>
</pattern>

Post by **chrispitude** » Wed Jun 16, 2021 9:48 pm

One of our writers got confused in the <mapref> configuration window and created a <mapref> that does not reference anything (no @keyref or @href):

Code: Select all

<mapref format="ditamap" keyscope="another_book" navtitle="another_book"
      processing-role="resource-only" scope="peer"/>

I added the following check to catch this:

Code: Select all

<rule context="mapref" role="error">
  <report test="not(@href or @keyref) and (not(@navtitle) or @processing-role='resource-only')">'mapref' does not reference any content or provide a @navtitle.</report>
</rule>

Post by **chrispitude** » Thu Jun 09, 2022 8:31 pm

We found that when the following CSS property is applied to make long words breakable in table cells:

Code: Select all

.table { word-break: break-word; }

and a table has a mix of fixed-value and proportional-allocation width values:

Code: Select all

<tgroup cols="3">
  <colspec colname="c1" colnum="1" colwidth="2in"/>
  <colspec colname="c2" colnum="2" colwidth="1*"/>
  <colspec colname="c3" colnum="3" colwidth="2*"/>

then the proportional allocations greedily consume the width, causing the fixed-width column to be compressed in the HTML5 and WebHelp outputs:

image.png

(This seems like a bug to me to ignore a fixed-width spacing, but Firefox, Chrome, and Edge all do it.)

To help writers avoid this condition, we added the following Schematron check:

Code: Select all

<pattern id="tables_error">
  <rule context="tgroup" role="error">
    <report test="colspec/@colwidth[contains(., '*')] and colspec/@colwidth[matches(., '\d') and not(contains(., '*'))]">Do not mix proportional ("*") widths with fixed widths; use blank values for automatically allocated widths.</report>
  </rule>
</pattern>

Here is a testcase for the browser behavior (but not with the Schematron check), if anyone wants to experiment with it:

html5_table_column_widths.zip

Post by **chrispitude** » Wed Jul 06, 2022 9:06 pm

I noticed today that my writers sometimes made typographical errors in their units in their @colwidth (table column width) values, such as "i" instead of "in". Here is a Schematron check to report invalid units:

Code: Select all

<pattern id="tables_error">
  <rule context="tgroup" role="error">
    <report test="colspec/@colwidth[matches(., '\d(?!\d|\.|cm|em|in|mm|pi|pt|px|[*%\s]|$)', ';j')]">Valid column width units are blank (automatic), * (proportional), in, pt, px, em, cm, mm, and pi.</report>
  </rule>
</pattern>

Although the CALS table specification lists only certain units as supported, PDF Chemistry and WebHelp (browsers) support some additional CSS-based units that I permitted as valid in the check.

Useful Schematron checks for DITA authoring

Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring

Re: Useful Schematron checks for DITA authoring