Quickly creating an initial custom dictionary file
Posted: Tue Nov 12, 2019 8:45 pm
I have a set of ten books that have many terms specific to our industry. I wanted to create a custom Hunspell dictionary (.dic, .aff) that contains the words I want to be valid for our books.
If you have access to a linux-like environment (I use WSL in Windows 10), you can do the following:
1. Run "Edit > Check Spelling in Files" from the menu, and check spelling on the desired scope (in my case, the directory containing all DITA files).
2. In the results window, choose "Save Results" from the right-click context menu and save to a "spell.txt" file.
3. Run a script to post-process the results:
This will give you a file that contains the "misspelled" words sorted by frequency, like this:
Edit this file to include only the words that are valid. Now save the file and run the following command to remove the count values:
Now create an empty matching .aff file:
This gives you a very simple dictionary that contains the list of words you want to consider valid. It doesn't use any of Hunspell's powerful stemming, affixation, or suggestion capabilities, but it's better than starting from an empty file!
If you have access to a linux-like environment (I use WSL in Windows 10), you can do the following:
1. Run "Edit > Check Spelling in Files" from the menu, and check spelling on the desired scope (in my case, the directory containing all DITA files).
2. In the results window, choose "Save Results" from the right-click context menu and save to a "spell.txt" file.
3. Run a script to post-process the results:
Code: Select all
sed -n 's!^Description: Misspelled word: "\([^"]*\)"\..*$!\1!p' spell.txt | sort | uniq -c | sort -n > my.dic
Code: Select all
1 uniquification
...
17 multivoltage
22 post-DFT
24 black-box
37 clock-gating
Code: Select all
sed -i 's!^ *[0-9]* *!!' my.dic
Code: Select all
touch my.aff