Quickly creating an initial custom dictionary file

Post here questions and problems related to editing and publishing DITA content.
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Quickly creating an initial custom dictionary file

Post by chrispitude »

I have a set of ten books that have many terms specific to our industry. I wanted to create a custom Hunspell dictionary (.dic, .aff) that contains the words I want to be valid for our books.

If you have access to a linux-like environment (I use WSL in Windows 10), you can do the following:

1. Run "Edit > Check Spelling in Files" from the menu, and check spelling on the desired scope (in my case, the directory containing all DITA files).

2. In the results window, choose "Save Results" from the right-click context menu and save to a "spell.txt" file.

3. Run a script to post-process the results:

Code: Select all

sed -n 's!^Description: Misspelled word: "\([^"]*\)"\..*$!\1!p' spell.txt | sort | uniq -c | sort -n > my.dic
This will give you a file that contains the "misspelled" words sorted by frequency, like this:

Code: Select all

1 uniquification
...
17 multivoltage
22 post-DFT
24 black-box
37 clock-gating
Edit this file to include only the words that are valid. Now save the file and run the following command to remove the count values:

Code: Select all

sed -i 's!^ *[0-9]* *!!' my.dic
Now create an empty matching .aff file:

Code: Select all

touch my.aff
This gives you a very simple dictionary that contains the list of words you want to consider valid. It doesn't use any of Hunspell's powerful stemming, affixation, or suggestion capabilities, but it's better than starting from an empty file!
sorin_carbunaru
Posts: 402
Joined: Mon May 09, 2016 9:37 am

Re: Quickly creating an initial custom dictionary file

Post by sorin_carbunaru »

Thank you for your contribution, Chris! We appreciate it!
Post Reply