Determining Elements that Exceed a Given Word Count

Post here questions and problems related to editing and publishing DITA content.
HSBOracle
Posts: 3
Joined: Thu Feb 24, 2022 6:58 pm

Determining Elements that Exceed a Given Word Count

Post by HSBOracle »

Is there a way to query the topics in a ditamap to see if a particular element exceeds a certain number of words? For example, identify all of the shortdesc (short description) elements that contain more than 40 words? Thanks, Scott
xephon
Posts: 161
Joined: Mon Nov 24, 2014 1:49 pm
Location: Greven/Germany

Re: Determining Elements that Exceed a Given Word Count

Post by xephon »

Hi,

You should create a Schematron rule and validate your map. The Schematron rule should use the document() function to step into all topics and validate the <shortdesc> elements.

Best regards
stefan-jung.org – Your DITA/DITA-OT XML consultant
chrispitude
Posts: 922
Joined: Thu May 02, 2019 2:32 pm

Re: Determining Elements that Exceed a Given Word Count

Post by chrispitude »

Hi HSBOracle,

As xephon suggests, you can do this by implementing a Schematron check. I would implement it by extending Oxygen's default DITA topic framework, then associating a new Schematron check file as a default check for DITA topic files:

oxygen_word_count.png

Here is a small testcase that demonstrates this approach:


oxygen_count_words_in_shortdesc.zip

When you run Validate and Check for Completeness on a map, the word count check will be applied to all topics. In addition, because the Automatic validation checkbox is checked in the validation scenario list, the word count check is also applied interactively in topic editing windows.
You do not have the required permissions to view the files attached to this post.
Radu
Posts: 9481
Joined: Fri Jul 09, 2004 5:18 pm

Re: Determining Elements that Exceed a Given Word Count

Post by Radu »

Hi,

Or use Oxygen's XPath Bulder view (main menu Window->Show view) having as "Scope" the current DITA Map hierarchy and using an XPath like:

Code: Select all

//shortdesc[count(tokenize(normalize-space(text()), '\s+')) &gt; 40]
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply