Counting words in a Ditamap
Having trouble installing Oxygen? Got a bug to report? Post it all here.
-
- Posts: 86
- Joined: Wed Feb 09, 2011 10:43 pm
Counting words in a Ditamap
Hello,
is there an easy and elegant way in Oxygen to count the words of all the files in a ditamap, including the topicheads? Of course it needs to include only the actual text that I wrote without the attibutes, elements, script etc.
Thanks
is there an easy and elegant way in Oxygen to count the words of all the files in a ditamap, including the topicheads? Of course it needs to include only the actual text that I wrote without the attibutes, elements, script etc.
Thanks
-
- Posts: 9446
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Counting words in a Ditamap
Hi,
If you transform the DITA Map to PDF at some stage the DITA Open Toolkit will merge all referenced topics into one huge XML.
If you edit the transformation scenario used to transform the DITA Map to PDF in the Parameters tab you can toggle the clean.temp parameter so that the temporary files folder is not deleted anymore.
Then from the temporary files folder you can open a file called:
ditaMapFileName_MERGED.xml.
That file contains the content from all DITA Topics.
Then you could apply an XSLT 2.0 stylesheet (using Saxon 9 EE) to it like the one below:
The result should be an estimate word count.
Another way to do this:
Open the DITA Map in the Oxygen DITA Maps Manager, choose from the toolbar "Open Map in Editor with resolved topics".
When the Map opens in the main editor open the Find/Replace dialog, check the Regular expression checkbox, search for \b\w+\b and press Find All. This should find all words and also give you a count. But it will probably take longer than the first method.
About the same trick could be done using Find/Replace in Files on the entire DITA Map and choosing only to search in element names.
Regards,
Radu
If you transform the DITA Map to PDF at some stage the DITA Open Toolkit will merge all referenced topics into one huge XML.
If you edit the transformation scenario used to transform the DITA Map to PDF in the Parameters tab you can toggle the clean.temp parameter so that the temporary files folder is not deleted anymore.
Then from the temporary files folder you can open a file called:
ditaMapFileName_MERGED.xml.
That file contains the content from all DITA Topics.
Then you could apply an XSLT 2.0 stylesheet (using Saxon 9 EE) to it like the one below:
Code: Select all
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:template match="/">
<counts>
<xsl:apply-templates/>
</counts>
</xsl:template>
<xsl:template match="text()"/>
<xsl:template match="*[contains(@class, 'map/map')]">
<xsl:variable name="text">
<xsl:apply-templates mode="getText" select="node()"/>
</xsl:variable>
<count>
<xsl:value-of
select="count(tokenize(lower-case($text),'(\s|[,.!:;]|[n][b][s][p][;])+')[string(.)])"
/>
</count>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[contains(@class, 'map/map')]"
mode="getText"/>
</xsl:stylesheet>
Another way to do this:
Open the DITA Map in the Oxygen DITA Maps Manager, choose from the toolbar "Open Map in Editor with resolved topics".
When the Map opens in the main editor open the Find/Replace dialog, check the Regular expression checkbox, search for \b\w+\b and press Find All. This should find all words and also give you a count. But it will probably take longer than the first method.
About the same trick could be done using Find/Replace in Files on the entire DITA Map and choosing only to search in element names.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ Artificial Intelligence (AI Positron Assistant add-on)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service