Searching for Missing Spaces in Content

Post here questions and problems related to editing and publishing DITA content.
tpopp
Posts: 6
Joined: Tue Jan 31, 2023 9:24 pm

Searching for Missing Spaces in Content

Post by tpopp »

Is there a way to globally search for missing spaces in Author or Text mode following a closing DITA element and insert the missing space?
For example, find <ph some text />X and replace it with <ph some text /> X
Basically, search for any character following a closing element tag that is not punctuation, like a comma, period, question mark, etc.?
It seems easy to insert DITA elements in Author mode and forget to put a blank space after them in you are showing Full tags.
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Searching for Missing Spaces in Content

Post by chrispitude »

Hi tpopp,

You could open Window > Show View > XPath/XQuery Builder, then search for the following XPath expression:

Code: Select all

//(ph|b|u|i|codeph)[matches(., '[A-Z][a-z]$')][following-sibling::node()[1][matches(., '^[A-Za-z]')]]
This searches for any <ph>, <b>, <u>, or <i> element that ends in a letter, followed by plaintext or another element that also begins with a letter. You can add more elements to the list as needed. If you know regular expressions, you can adjust the match() pattern to include additional character types.

To consider elements that might have variable text, you can include elements that have @keyref:

Code: Select all

//(ph|b|u|i|codeph)[matches(., '[A-Z][a-z]$') or @keyref][following-sibling::node()[1][matches(., '^[A-Za-z]')]]
In regular expressions, "^" matches the beginning of a string and "$" matches the end of the string. You can flip the XPath expression around as follows to search the beginning of elements too:

Code: Select all

//(ph|b|u|i|codeph)[matches(., '^[A-Z][a-z]')][preceding-sibling::node()[1][matches(., '[A-Za-z]$')]]
You could even turn these into Schematron checks so they would be interactively highlighted in the editing window in real-time.
Radu
Posts: 9057
Joined: Fri Jul 09, 2004 5:18 pm

Re: Searching for Missing Spaces in Content

Post by Radu »

Hi,

The alternative using Oxygen's "Find/Replace in Files" dialog, search for something like </(ph|b|u|i|codeph)>[a-zA-Z]+ with the "Regular expression" checkbox checked.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Radu
Posts: 9057
Joined: Fri Jul 09, 2004 5:18 pm

Re: Searching for Missing Spaces in Content

Post by Radu »

About this useful remark from Chris:
You could even turn these into Schematron checks so they would be interactively highlighted in the editing window in real-time.
There is this article on the Oxygen XML blog about adding your own Schematron schema to a DITA framework configuration:
https://blog.oxygenxml.com/topics/shari ... rules.html

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
tpopp
Posts: 6
Joined: Tue Jan 31, 2023 9:24 pm

Re: Searching for Missing Spaces in Content

Post by tpopp »

Radu wrote: Wed Feb 01, 2023 8:52 am Hi,

The alternative using Oxygen's "Find/Replace in Files" dialog, search for something like </(ph|b|u|i|codeph)>[a-zA-Z]+ with the "Regular expression" checkbox checked.

Regards,
Radu
Radu,
This seems to work, thank you! What is the + for at the end? If I want to do this as a global search in O2, can I do that? What about if I want to insert the space in an individual occurrence or globally? Can you save regular expression find/replace strings in O2 for others to use?
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: Searching for Missing Spaces in Content

Post by chrispitude »

And for either method (regular-expression search or XPath search), you can create a test topic with example constructs that should be found or not found, then experiment on that single file with the scope set to Current File. This can save time and build confidence in the technique before you apply it to a larger scope.
Post Reply