Oxygen XML Forum

Posted: **Tue Nov 12, 2019 8:45 pm**

I have a set of ten books that have many terms specific to our industry. I wanted to create a custom Hunspell dictionary (.dic, .aff) that contains the words I want to be valid for our books.

If you have access to a linux-like environment (I use WSL in Windows 10), you can do the following:

1. Run "Edit > Check Spelling in Files" from the menu, and check spelling on the desired scope (in my case, the directory containing all DITA files).

2. In the results window, choose "Save Results" from the right-click context menu and save to a "spell.txt" file.

3. Run a script to post-process the results:

Code: Select all

sed -n 's!^Description: Misspelled word: "\([^"]*\)"\..*$!\1!p' spell.txt | sort | uniq -c | sort -n > my.dic

This will give you a file that contains the "misspelled" words sorted by frequency, like this:

Code: Select all

1 uniquification
...
17 multivoltage
22 post-DFT
24 black-box
37 clock-gating

Edit this file to include only the words that are valid. Now save the file and run the following command to remove the count values:

Code: Select all

sed -i 's!^ *[0-9]* *!!' my.dic

Now create an empty matching .aff file:

Code: Select all

touch my.aff

This gives you a very simple dictionary that contains the list of words you want to consider valid. It doesn't use any of Hunspell's powerful stemming, affixation, or suggestion capabilities, but it's better than starting from an empty file!

Posted: **Thu Nov 14, 2019 6:03 pm**

Thank you for your contribution, Chris! We appreciate it!

Oxygen XML Forum

Quickly creating an initial custom dictionary file

Quickly creating an initial custom dictionary file

Re: Quickly creating an initial custom dictionary file