Page 1 of 1

Counting words in a Ditamap

Posted: Thu Feb 10, 2011 4:27 pm
by Boreas
Hello,

is there an easy and elegant way in Oxygen to count the words of all the files in a ditamap, including the topicheads? Of course it needs to include only the actual text that I wrote without the attibutes, elements, script etc.

Thanks

Re: Counting words in a Ditamap

Posted: Thu Feb 10, 2011 5:03 pm
by Radu
Hi,

If you transform the DITA Map to PDF at some stage the DITA Open Toolkit will merge all referenced topics into one huge XML.
If you edit the transformation scenario used to transform the DITA Map to PDF in the Parameters tab you can toggle the clean.temp parameter so that the temporary files folder is not deleted anymore.
Then from the temporary files folder you can open a file called:
ditaMapFileName_MERGED.xml.
That file contains the content from all DITA Topics.
Then you could apply an XSLT 2.0 stylesheet (using Saxon 9 EE) to it like the one below:

Code: Select all

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:template match="/">
<counts>
<xsl:apply-templates/>
</counts>
</xsl:template>
<xsl:template match="text()"/>
<xsl:template match="*[contains(@class, 'map/map')]">
<xsl:variable name="text">
<xsl:apply-templates mode="getText" select="node()"/>
</xsl:variable>
<count>
<xsl:value-of
select="count(tokenize(lower-case($text),'(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"
/>
</count>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[contains(@class, 'map/map')]"
mode="getText"/>
</xsl:stylesheet>
The result should be an estimate word count.


Another way to do this:
Open the DITA Map in the Oxygen DITA Maps Manager, choose from the toolbar "Open Map in Editor with resolved topics".
When the Map opens in the main editor open the Find/Replace dialog, check the Regular expression checkbox, search for \b\w+\b and press Find All. This should find all words and also give you a count. But it will probably take longer than the first method.

About the same trick could be done using Find/Replace in Files on the entire DITA Map and choosing only to search in element names.

Regards,
Radu