Page 1 of 1

Term Lists (.tdi) file usage

Posted: Fri Mar 31, 2023 3:59 pm
by amjc
Hi, first post, love the product.
We're using .tdi file, shared on a network drive, and have questions:
* what are "forbidden" words?
* is there a comments syntax format for the .tdi?
* is the best practice to use the .tdi to collect exceptions to the standard dictionaries, but ultimately move the terms to a custom Hunspell dictionary? Maybe this would allow case-sensitivity?
Thanks, John

Re: Term Lists (.tdi) file usage

Posted: Mon Apr 03, 2023 5:55 am
by Radu
Hi John,
Thanks for the kind words, please see some answers below:
We're using .tdi file, shared on a network drive, and have questions:
* what are "forbidden" words?
Usually the .tdi file would keep words which are not in the dictionary but you do not want the spell checker to report as invalid.
The forbidden words are the other way around, they are words which are in the dictionary but you want them repored as invalid anyway.
For example for technical documentation writing you want only present tense to be used, so maybe you want to mark "will" as forbidden as a primitive way to forbid future tenses.
Related to forbidden words we also have a free terminology checker add-on which allows you to define sequences of words which are forbidden, it also comes with support for using Vale rules and to define your own forbidden words in a special XML file format:
https://www.oxygenxml.com/doc/versions/ ... addon.html
For example for the Oxygen user's guide we use the Microsoft Style Guide vale rules which flag various sequences of words which may be problematic:
https://github.com/oxygenxml/userguide/ ... rm-checker
* is there a comments syntax format for the .tdi?
No, the format is very simple, one entry on each line, no support for comments.
* is the best practice to use the .tdi to collect exceptions to the standard dictionaries, but ultimately move the terms to a custom Hunspell dictionary?
We do not have such a best practice.
Maybe this would allow case-sensitivity?
Could you give me a small example about your current problem?
We have a remark here in our user's manual:
https://www.oxygenxml.com/doc/versions/ ... rp_bgk_54b
When such problems are reported, they cannot be learned and ignored by the application as words stored in dictionaries, term lists, and the list of learned words are not handled as case-sensitive.
so it seems that in general for now we do not handle the words we check in a case sensitive manner.

Regards,
Radu

Re: Term Lists (.tdi) file usage

Posted: Mon Apr 03, 2023 1:14 pm
by chrispitude
Hi John,

Your thought about accumulating learned words in the term list, then periodically moving them to a Hunspell dictionary is an interesting one! I think I will implement this in our environment. To start, I will configure the term file location to be in our Oxygen/Git project (${pd}/...), then writers can commit and push new learned words to the Git repository.

The nice thing about this approach is that the .tdi file serves as a "holding area" where words can be reviewed for correctness by senior writers first, then updated to handle case/prefixes/suffixes correctly as they are converted to Hunspell dictionary entries.

Speaking of Hunspell dictionary files, I have a post here about making a starter Hunspell dictionary from common unknown words in your content, if it is helpful.