Edit online

Terminology Checker Add-on

Oxygen XML Editor offers an add-on that provides support for checking terminology. Once the add-on is installed, you can create a terminology file with a set of rules for each term (or sequence of characters) you want flagged. After referencing the custom file, Oxygen XML Editor will automatically highlight matched terms in the Author visual editing mode and offer some contextual menu actions.
Tip: The terminology checker works for any document opened in the Author visual editing mode, including XML file types, and JSON and HTML5 document types.

Quick Installation

You can drag the following Install button and drop it into the main editor in Oxygen (version 24.1 or newer) to quickly initiate the installation process:

Install

Manual Installation

To manually install the add-on, follow this procedure:

  1. Go to Help > Install new add-ons to open an add-on selection dialog box.
  2. Enter or paste https://www.oxygenxml.com/InstData/Addons/default/updateSite.xml in the Show add-ons from field or select it from the drop-down menu.
    Note: If you have issues connecting to the default update site, you can download the add-on package, unzip it, then use the Browse for local files action in the Install new add-ons dialog box to locate the downloaded addon.xml file.
  3. Select the Terminology Checker add-on and click Next.
  4. Read the end-user license agreement. Then select the I accept all terms of the end-user license agreement option and click Finish.
  5. Restart the application.

Creating Custom Rules for the Terminology Checker

To create your own custom rules for the terminology checker, follow this procedure:

  1. Create a terminology file. There is a template available to help you get started in the New Document wizard. Click the New button on the toolbar or select File > New and search for the Terminology File template. Here is an example of the structure for this type of file:
    <incorrect-terms lang="en">
        <incorrect-term ignorecase="true">
            <match>Oxygen</match>
            <suggestion></suggestion>
            <message>Product name should be inside a tag.</message>
        </incorrect-term>
    </incorrect-terms>
  2. Save the newly created terminology rules XML file either in a new subfolder named oxygen-term-checker located in the current project folder (the current project opened in the application Project view), or in a custom folder.
  3. If you saved the terminology file in a custom folder path, go to the Options > Preferences > Plugins > Terminology Checker preferences page and set the Additional Terminology folder path to point to that folder.
  4. Click OK several times to apply the changes and close the preferences dialog box.
Result: If any of the terms (or sequence of characters) that are defined in the terminology file are detected in any open file, Oxygen XML Editor highlights the matches in the Author visual editing mode.
Note: If you have a folder named oxygen-term-checker in the current project that is open in the Project view, all the files in that folder will also be loaded by the terminology checker.

Structure of Terminology Rules File

The following elements can be used in the terminology rules XML file:
<incorrect-terms>
This is the root element of the XML rules file.

You can specify the @lang attribute on the <incorrect-terms> root element. When set, the terms defined in this terminology file are applied when the closest @xml:lang attribute of the checked node matches the value specified. Not setting this attribute means that the incorrect terms are applied for all nodes.

If the @xml:lang attribute is not defined in your document, the language specified in the Spell Check preferences is used.

Note: If the value of the document's @xml:lang attribute is not a superset of the value of the @lang attribute for the <incorrect-terms> element, there will not be a match.
Table 1. Language Matching Matrix
@lang value for <incorrect-terms> element @xml:lang value
en en_US
en match match
en_US not matched match
You can specify the @phase attribute on the <incorrect-terms> root element. The value of this attribute is inherited by <incorrect-term> children nodes. Not setting this attribute means the default phase is used.
The allowed values are:
  • always - Incorrect terms are always presented (default value).
  • editing - Incorrect terms are shown when the document is opened in the Author mode.
  • validation - Incorrect terms are shown when the document is checked from a validation scenario.

For example, set this attribute if you want to apply the most important rules when validating with the Validate and Check for Completeness action, while still keeping them applied in the editing window.

<incorrect-term>

Defines ways to match and correct an incorrect term. The <incorrect-term> element must include a <match> element.

The @ignorecase attribute specifies whether or not the match is case-sensitive.

The @severity attribute can be set to one of the following values: info, warning, or error. Example:
    <incorrect-term severity="error">
        <match>he</match>
        <message>Pronouns should be avoided.</message>
    </incorrect-term>
An experimental @part-of-speech attribute can be set on the <incorrect-term> element with the value set to a part of speech tag (for example: adjective, verb, etc.) If set, when scanning for terminology problems, the problem is presented only if the term's part of speech matches the one specified. The processor used to identify the part of speech is Apache OpenNLP and this feature is supported only for the English language.
Note: The results may not be 100% accurate, so you should double-check them.
<match>

Specify the text fragment to match.

You can specify the @type attribute on the <match> element, with the values character, whole-word, or regular-expression. The default value is whole-word, unless the matched term contains Japanese, Chinese, or Korean characters because Asian languages often do not use spaces to separate words. Example:
    <incorrect-term>
        <match type="character">ing</match>
        <message>Progressive tense should not be allowed</message>
    </incorrect-term>
<suggestion>

The <suggestion> element can be left blank or there can be one or more of them inside the <incorrect-term> element. It supports regular expressions grouping.

If you want to replace the match with an XML fragment, you can set the @format attribute on the <suggestion> element with the value xml. For example:
   <incorrect-term ignorecase="true">
        <match type="whole-word">Oxygen XML Editor</match>
        <suggestion format="xml">&lt;ph keyref=&quot;oxygen&quot;/></suggestion>
        <message>Replace all occurrences of product with key reference.</message>
    </incorrect-term>
<message>
The <message> element is optional. If present, its content is displayed in a tooltip when you hover over a highlight. It supports regular expressions grouping.
<link>
The <link> element is optional. If present, it provides the source for this rule. Example:
<incorrect-term ignorecase="true">
  <match type="whole-word">Oxygen XML Editor</match>
  <suggestion format="xml">&lt;ph keyref=&quot;oxygen&quot;/></suggestion>
  <link>https://www.oxygenxml.com/doc/ug-editor/topics/terminology-checker.html</link>
</incorrect-term>
<xpath-context>

The <xpath-context> element can be used to define simple XPath expressions that match specific elements.

You can specify @include and @exclude attributes. The elements covered by this simplified XPath will be checked for matches (or the exclusion of a match). A list of comma-separated XPath values can be used. Example:
<xpath-context include="p, div, codeblock">

The following are examples of how simplified XPath expressions might look like:

  • elementName
  • //elementName
  • /elementName1/elementName2/elementName3
  • //xs:localName
    Note: The namespace prefixes (such as xs) are treated as part of the element name without taking its binding to a namespace into account.

You can use one or more of the following attribute conditions:

Attention: Default attribute values are not taken into account.
  • element[@attr] - Matches all instances of the specified element when it includes the specified attribute.
  • element[not(@attr)] - Matches all instances of the specified element when it does not include the specified attribute.
  • element[@attr = 'value'] - Matches all instances of the specified element when it includes the specified attribute with the given value.
  • element[@attr != 'value'] - Matches all instances of the specified element when it includes the specified attribute and its value is different than the one given.

Using Vale Rules with the Terminology Checker

The Terminology Checker has partial support for applying custom Vale rules.

Supported Vale scopes: heading, table.header, table.cell, list, paragraph, code, strong, emphasis, sentence.

Supported Vale extension points: Existence, Substitution, Occurrence, Repetition, Conditional.

Result: If any of the terms (or sequence of characters) that are defined in the terminology file are detected in any open file, Oxygen XML Editor highlights the matches in the Author visual editing mode.
Note: If you have a folder named oxygen-term-checker in the current project that is open in the Project view, all the files in that folder will also be loaded by the terminology checker. As an example, the Oxygen XML Editor user guide has a folder with some of the Microsoft style guide rules: https://github.com/oxygenxml/userguide/tree/master/DITA/oxygen-term-checker. Once the user guide project is open in the Oxygen XML Editor Project view, the add-on will start using those rules to check the content.
Resources: You can find already created Vale rules that implement various checks on the following websites:

Working with the Terminology Checker

The Terminology Checker side view shows all problems found in the document. You can right-click each problem to apply possible fixes or to find out more details about it. The tooltip for each problem displays a custom message and more information (e.g. for Vale rules, it also displays the name of the Vale rule file that defines the rule). You can filter problems based on their severity, match, and message and the toolbar has actions to navigate between problems or to open the Terminology Checker preferences page.

You can also right-click problems highlighted in the Author visual editing mode to access the following contextual menu actions:
  • Replace with "…" - Replaces the currently highlighted match with the content inside the <suggestion> element.
  • Replace all with "…" - Replaces all instances of the highlighted match found in the current document with the content inside the <suggestion> element.
  • Correct all matching highlights - Replaces all highlighted matches (all matched terms) within the document with the content inside the first <suggestion> element from the terminology file.

The terminology checking can be disabled by clicking the Show/Hide Terminology Highlights toolbar button.

Other Notes:
  • The checker automatically skips deleted content with tracked changes and space-preserved elements (e.g. codeblocks).
  • When replacements are performed, the capitalization is preserved.
  • In the Oxygen XML Editor Options > Preferences > Plugins > Terminology Checker page, you can define the highlight colors to be used for each issue depending on its severity. You can also reference a folder that contains the terminology rules. This folder can contain other folders with terminology files or just the terminology files. The option that controls automatic capitalization can also be found in this preferences page.
  • If you select Project Options (in the Terminology Checker preferences page), the settings are stored in the project file (.xpr) that can be shared with other users.

Terminology Checker Preferences

The Options > Preferences > Plugins > Terminology Checker preferences page contains various settings for configuring tool. The preferences page can be saved at project level to share these settings, as is common for a group of users who use the same project configuration.
Highlight background
You can specify various colors to influence the background colors for terminology highlights that are added in the Author visual editing mode.
Highlight decoration
You can specify various colors to influence the highlight decoration styles for terminology highlights that are added in the Author visual editing mode.
Editing
Preserve case when performing replacements
Controls whether or not the original letter casing is automatically preserved when replacing words. The option is selected by default.
Report unsupported Vale rules as errors
If selected (default), errors that are related to Vale terms (such as unsupported extension points or invalid properties) are reported. If not selected, unsupported Vale rules are ignored (although an error is still reported if the file is invalid).
Learned terms
Default project terminology folder
Displays the default location where all the terminology rule files (XML or Vale) are stored. By default, the rule files located in the oxygen-term-checker subfolder of the current project folder (the current project loaded in the Project view) are automatically loaded and used.
Additional terminology folder
You can use this option to specify an additional terminology folder where XML and Vale rule files are located. You can use editor variables such as ${pd}/terms to specify the path to the terminology folder.

Checking Multiple Resources

Once installed, the terminology checker add-on can be used to batch-check multiple files:
  • Right-click on the root of the DITA map opened in the DITA Maps Manager view and choose Check terminology.
  • Right-click a folder in the Project view and choose Check terminology.
  • Create a new validation scenario or edit an existing validation scenario, and add a new validation stage. For the File type field, choose XML Document and for the Validation engine field, choose Terminology checker. The validation scenario can be used in multiple ways:
    • In the Project view, you can right-click a folder and validate using a specific validation scenario.
    • In the DITA Maps Manager view, you can use the Validate and Check for Completeness toolbar action and choose to Batch validate referenced DITA resources. This will apply the associated validation scenario for each topic or map referenced in the context of the main DITA map.

Terminology Files Contributed from Other Oxygen Add-ons

Any Oxygen add-on can contribute terminology files that will be used by the Terminology Checker. The contributed terminology files will be loaded and used if the contributor add-on is enabled.

The following pre-conditions must be fulfilled:
  1. The contributor add-on's plugin.xml descriptor file should reference the rules folder in the plugin.xml as a librariesFolder with a global scope:
    <plugin
        id="unique.identifier.name"
        name="My Style Guide"
        description="Style Guide"
        version="1.0"
        vendor="Vendor Name"
        class="ro.sync.exml.plugin.Plugin"
        classLoaderType="preferReferencedResources">
        <runtime>
            <librariesFolder name="Rules_Folder" scope="global"/>
        </runtime>
    </plugin>
  2. The contributor add-on should have a marker file named oxy-terms-auto-detect inside the rules folder. The terminology files can be added in the rules folder or organized in subfolders (the Terminology Checker scans the subfolders to identify the terminology files). Inside the oxy-terms-auto-detect file, there should be a textual description of the terminology file contents, which is used when presenting add-on contributed terms in the Terminology Checker preferences page (Options > Preferences > Plugins > Terminology Checker).

Resources

For more information about the Terminology Checker add-on, along with details regarding other popular add-ons that extend the functionality of Oxygen XML Editor, watch the following webinar: