Page 1 of 1

Diff Tool and White Spaces

Posted: Mon Mar 30, 2015 10:03 pm
by ajuan
One of our writers was using the diff tool in oXygen to compare old files and new files (XML), but it was reporting a lot of differences, because of white spaces.

Specifically, instead of showing a middot (ยท) to indicate white spaces, a tab character was appearing instead () in one of the files. The strange thing is that if you open the files in Author mode, both files show that the middots were being used. If we lowered the algorithm strength, nothing happened.

We managed to get around this issue by selecting the "Formatting" button that made both files formatted the same way, but is this a bug with the diff tool?

Thanks for any help/response in advance.

Re: Diff Tool and White Spaces

Posted: Tue Mar 31, 2015 4:13 pm
by Costin
Hello,

Indeed, the white spaces are normalized whenever the document editing is switched to the Author mode, which was the intended behavior in order to ease visual editing of the documents.

However, please note that when comparing the files using the Diff Files tool, the white spaces can be ignored, so they won't trigger any differences while comparing the files.
To ignore the white spaces, you should go into your Diff settings, either by using the appropriate toolbar shortcut, or through the menu Options > Preferences > Diff / Files Comparison and enable the "Ignore Whitespaces" option.

This would do exactly what Author mode implicitly does and will determine the white space sequences to be normalized into a single space character.

Feel free to let us know whenever you need any other information.

Best Regards,
Costin

Re: Diff Tool and White Spaces

Posted: Tue Mar 31, 2015 8:18 pm
by ajuan
Hi,

Sorry, I should have added in my original post that Ignore White Spaces were on. The problem was that the white spaces were different symbols and I'm guessing that the encoding was being read differently in each file. Even with this setting on, it was reporting the changes.

I did not know that white spaces are normalized in Author mode. This is good information to have on hand. Thanks!

Re: Diff Tool and White Spaces

Posted: Wed Apr 01, 2015 10:38 am
by Costin
Hello,

Thank you for the additional information.

We have also tested this, but could not reproduce it on our side, using the Diff tool in oXygen XML Editor v16.1. It might be a particular situation which triggers this behavior. What version of oXygen XML are you working with?

If this is possible, please send us a sample document for which the issue is reproducible and we will investigate it further.
You should send the sample document on our support email address: support@oxygenxml.com

Regards,
Costin

Re: Diff Tool and White Spaces

Posted: Wed Apr 01, 2015 8:19 pm
by ajuan
Hi,

I have sent the two files that were originally causing problems (before selecting the "Format and Indent Both Files" option).

The files seem to be working now, but there is a jpeg attached to the email that shows that this was happening previously.

Cheers,
Anne

Re: Diff Tool and White Spaces

Posted: Thu Apr 02, 2015 12:36 pm
by Costin
Hi,

We have received the files you sent on our support email address and replied there.

We also reproduced this when using the "Words" algorithm for your specific XML documents and suggested to use a different (XML Aware) algorithm (either "XML Fast" or "Auto") for XML documents.

This was logged in our internal tracking system for further investigation.

Regards,
Costin

Re: Diff Tool and White Spaces

Posted: Tue Apr 07, 2015 4:43 pm
by Costin
Hello,

We have discussed this situation internally with our developers and we reached to the conclusion that the current behavior (diff tool reporting differences for files compared using the "Word" algorithm) is even the intended one.

This is because words separated by white spaces are considered as two different entities by the Diff tool when using the "words" level comparison algorithm, even if the white spaces are set to be ignored.

For this reason I suggested you to use an XML aware algorithm instead, which ignores the white spaces between the words in your documents.

Regards,
Costin

Re: Diff Tool and White Spaces

Posted: Tue Apr 28, 2015 9:43 pm
by ajuan
Thanks again Costin for your help with this. I'll report back to my writer.

Cheers,
Anne

Re: Diff Tool and White Spaces

Posted: Fri Nov 17, 2017 9:03 pm
by xinelo
I am also getting this issue when comparing by Characters in Diff Files 18.0, when comparing PHP/HTML files.

A bit annoying.

Cheers, Manuel

Re: Diff Tool and White Spaces

Posted: Fri Nov 17, 2017 9:08 pm
by xinelo
I forgot to mention: if I use Characters mode, I get the whitespace highlighted as a difference (often even though I don't see any difference, and even accepting the merge produces dissimilar documents). If I use Auto mode, I get the whole line highlighted, which is even more annoying because I can't see what is different. However, I don't get the issue if I use Words granularity.

Re: Diff Tool and White Spaces

Posted: Mon Dec 04, 2017 1:22 pm
by Costin
Hi xinelo,

As I also specified in the previous reply from this older post, it is intended behavior to report differences when comparing files with a non xml-aware algorithms (like Characters, Words, or Lines), depending on each specific algorithm and on the whitespaces placement in specific contexts from the document. If you need Diff to ignore such characters, you should try using an XML aware algorithm (like XML Fast or XML Accurate).

In your case however, it seems you did not set the "Ignore whitespaces" option. That should be enabled even when using XML aware algorithms.
Therefore, please double check that in Options > Preferences > Diff > Files Comparison you enable "Ignore Whitespaces" and apply the changes.
If, even after you set that option and use an XML aware algorithm, the Diff Files tool still reports differences please send us some sample files on our support (the official support email is supportAToxygenxmlDOTcom) on which the behavior is reproducible to investigate further.

Regards,
Costin