Determining Elements that Exceed a Given Word Count

Post here questions and problems related to editing and publishing DITA content.
HSBOracle
Posts: 3
Joined: Thu Feb 24, 2022 6:58 pm

Determining Elements that Exceed a Given Word Count

Post by HSBOracle »

Is there a way to query the topics in a ditamap to see if a particular element exceeds a certain number of words? For example, identify all of the shortdesc (short description) elements that contain more than 40 words? Thanks, Scott
xephon
Posts: 140
Joined: Mon Nov 24, 2014 1:49 pm
Location: Greven/Germany

Re: Determining Elements that Exceed a Given Word Count

Post by xephon »

Hi,

You should create a Schematron rule and validate your map. The Schematron rule should use the document() function to step into all topics and validate the <shortdesc> elements.

Best regards
stefan-jung.org – Your DITA/DITA-OT XML consultant
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Determining Elements that Exceed a Given Word Count

Post by chrispitude »

Hi HSBOracle,

As xephon suggests, you can do this by implementing a Schematron check. I would implement it by extending Oxygen's default DITA topic framework, then associating a new Schematron check file as a default check for DITA topic files:

oxygen_word_count.png
oxygen_word_count.png (32.55 KiB) Viewed 617 times

Here is a small testcase that demonstrates this approach:


oxygen_count_words_in_shortdesc.zip
(17.91 KiB) Downloaded 93 times

When you run Validate and Check for Completeness on a map, the word count check will be applied to all topics. In addition, because the Automatic validation checkbox is checked in the validation scenario list, the word count check is also applied interactively in topic editing windows.
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: Determining Elements that Exceed a Given Word Count

Post by Radu »

Hi,

Or use Oxygen's XPath Bulder view (main menu Window->Show view) having as "Scope" the current DITA Map hierarchy and using an XPath like:

Code: Select all

//shortdesc[count(tokenize(normalize-space(text()), '\s+')) &gt; 40]
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply