Counting words in a Ditamap

Having trouble installing Oxygen? Got a bug to report? Post it all here.
Boreas
Posts: 86
Joined: Wed Feb 09, 2011 10:43 pm

Counting words in a Ditamap

Post by Boreas »

Hello,

is there an easy and elegant way in Oxygen to count the words of all the files in a ditamap, including the topicheads? Of course it needs to include only the actual text that I wrote without the attibutes, elements, script etc.

Thanks
Radu
Posts: 9446
Joined: Fri Jul 09, 2004 5:18 pm

Re: Counting words in a Ditamap

Post by Radu »

Hi,

If you transform the DITA Map to PDF at some stage the DITA Open Toolkit will merge all referenced topics into one huge XML.
If you edit the transformation scenario used to transform the DITA Map to PDF in the Parameters tab you can toggle the clean.temp parameter so that the temporary files folder is not deleted anymore.
Then from the temporary files folder you can open a file called:
ditaMapFileName_MERGED.xml.
That file contains the content from all DITA Topics.
Then you could apply an XSLT 2.0 stylesheet (using Saxon 9 EE) to it like the one below:

Code: Select all

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:template match="/">
<counts>
<xsl:apply-templates/>
</counts>
</xsl:template>
<xsl:template match="text()"/>
<xsl:template match="*[contains(@class, 'map/map')]">
<xsl:variable name="text">
<xsl:apply-templates mode="getText" select="node()"/>
</xsl:variable>
<count>
<xsl:value-of
select="count(tokenize(lower-case($text),'(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"
/>
</count>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[contains(@class, 'map/map')]"
mode="getText"/>
</xsl:stylesheet>
The result should be an estimate word count.


Another way to do this:
Open the DITA Map in the Oxygen DITA Maps Manager, choose from the toolbar "Open Map in Editor with resolved topics".
When the Map opens in the main editor open the Find/Replace dialog, check the Regular expression checkbox, search for \b\w+\b and press Find All. This should find all words and also give you a count. But it will probably take longer than the first method.

About the same trick could be done using Find/Replace in Files on the entire DITA Map and choosing only to search in element names.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply