Page 1 of 1

Quickly creating an initial custom dictionary file

Posted: Tue Nov 12, 2019 8:45 pm
by chrispitude
I have a set of ten books that have many terms specific to our industry. I wanted to create a custom Hunspell dictionary (.dic, .aff) that contains the words I want to be valid for our books.

If you have access to a linux-like environment (I use WSL in Windows 10), you can do the following:

1. Run "Edit > Check Spelling in Files" from the menu, and check spelling on the desired scope (in my case, the directory containing all DITA files).

2. In the results window, choose "Save Results" from the right-click context menu and save to a "spell.txt" file.

3. Run a script to post-process the results:

Code: Select all

sed -n 's!^Description: Misspelled word: "\([^"]*\)"\..*$!\1!p' spell.txt | sort | uniq -c | sort -n > my.dic
This will give you a file that contains the "misspelled" words sorted by frequency, like this:

Code: Select all

1 uniquification
...
17 multivoltage
22 post-DFT
24 black-box
37 clock-gating
Edit this file to include only the words that are valid. Now save the file and run the following command to remove the count values:

Code: Select all

sed -i 's!^ *[0-9]* *!!' my.dic
Now create an empty matching .aff file:

Code: Select all

touch my.aff
This gives you a very simple dictionary that contains the list of words you want to consider valid. It doesn't use any of Hunspell's powerful stemming, affixation, or suggestion capabilities, but it's better than starting from an empty file!

Re: Quickly creating an initial custom dictionary file

Posted: Thu Nov 14, 2019 6:03 pm
by sorin_carbunaru
Thank you for your contribution, Chris! We appreciate it!