Batch validation minor annoyances

Post here questions and problems related to editing and publishing DITA content.
sanGeoff
Posts: 42
Joined: Mon Aug 18, 2014 11:50 pm

Batch validation minor annoyances

Post by sanGeoff »

After upgrading from oXygen 17.1 to 18.1, there were a few new validation issues our DITA content was not conforming to, so I started to try and use the batch validation to cleanup our content a bit. Here are a few very minor quarks about the batch validation I just wanted to point out.
  • I had opened our project xpr file by lowercase URL path. This caused every single reference to result in a validation error of "...incorrect path capitalization.." or "File not found...". While minor, it results in over 160k validation errors that went away when I opened the xpr with correct case path.
  • No way of knowing the batch validation progress, it takes my PC over an hour to run through all our files. I couldn't even detect any logic in the file order it was scanning to guesstimate how much longer it would take. It would be nice if it at least counted the files first and then used that count to indicate the progress bar position.
  • After waiting over an hour, I went to try and save the ~200k validation results to text and xml file. Sadly, oXygen kept running out of memory when I tried to do that. After fixing the xpr path issue though, and getting the validation issues under control, it worked. Kinda annoying though the first time not having any way to export the results that took an hour to generate.
  • No way to automatically filter and exclude some messages from batch scan. We have a plug-in that has a ValidationProblemsFilter listener for the current editor, but there does not appear to be any way to attach that to the batch validation routine. There are a few errors such as the ones complaining about "-dita-use-conref-target" attribute values that are not issues and should not be filtered.
Also here are two related things I noticed:
  • I added a ValidationProblemsFilter to the map editor to exclude a few messages. It correctly filters the probables, however the problem view panel automatically opens up and is empty when all messages are filtered.
  • When renaming a colname in the colspec of a table in author view, all the references to the old colname in the table are not updated. It would be nice if oXygen automatically updated the colname references in the table as well. I could probably incorporate this into our plug-in but seems like something oXygen would do automatically.
Again, all throes things are very minor and i am kind of just nitpicking, but thought I would jot them down.

Platform: Standalone - Windows 32-bit
Version : <oXygen/> XML Author 18.1, build 2017020917
Java Ver: Java SE 8u102
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: Batch validation minor annoyances

Post by Radu »

Hi,

These are a lot of good observations, please see some answers below:
I had opened our project xpr file by lowercase URL path. This caused every single reference to result in a validation error of "...incorrect path capitalization.." or "File not found...". While minor, it results in over 160k validation errors that went away when I opened the xpr with correct case path.
I will try to reproduce this on my side. So the project references a folder which contains the DITA resources, right? Why did you not open the main DITA Map in the DITA Maps Manager and use the "Validate and check for completeness" action? It shows more problems than just validating each topic individually...
No way of knowing the batch validation progress, it takes my PC over an hour to run through all our files. I couldn't even detect any logic in the file order it was scanning to guesstimate how much longer it would take. It would be nice if it at least counted the files first and then used that count to indicate the progress bar position.
We also do not have a good way to know how much this will take. I think that at some point we counted all files and encountered a situation in which someone linked their entire harddrive in the project. So counting all the files lasted a long time... but I understand where you stand and I will add an internal issue, maybe we can better report how long this will take.
After waiting over an hour, I went to try and save the ~200k validation results to text and xml file. Sadly, oXygen kept running out of memory when I tried to do that. After fixing the xpr path issue though, and getting the validation issues under control, it worked. Kinda annoying though the first time not having any way to export the results that took an hour to generate.
I will add an internal issue to test if our batch validation properly releases memory in the end. So these were mostly DITA topics/maps/concepts and so on right? Were they DTD, XML Schema, or RNG based?
No way to automatically filter and exclude some messages from batch scan. We have a plug-in that has a ValidationProblemsFilter listener for the current editor, but there does not appear to be any way to attach that to the batch validation routine. There are a few errors such as the ones complaining about "-dita-use-conref-target" attribute values that are not issues and should not be filtered.
I see below that you managed to get the filter API working. Could you tell us about those particular issues that Oxygen should not report anymore? Maybe we can avoid reporting them on our side if they are nonsense...
I added a ValidationProblemsFilter to the map editor to exclude a few messages. It correctly filters the probables, however the problem view panel automatically opens up and is empty when all messages are filtered.
I will add a filter on my side and test this.
When renaming a colname in the colspec of a table in author view, all the references to the old colname in the table are not updated. It would be nice if oXygen automatically updated the colname references in the table as well. I could probably incorporate this into our plug-in but seems like something oXygen would do automatically.
Yes, we do not yet have this feature, we have plans for this and I fully agree it would be a good improvement. I will add your contact details to the opened internal issue.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
sanGeoff
Posts: 42
Joined: Mon Aug 18, 2014 11:50 pm

Re: Batch validation minor annoyances

Post by sanGeoff »

Hi thanks for the quick and detailed reply.
Radu wrote:I will try to reproduce this on my side. So the project references a folder which contains the DITA resources, right? Why did you not open the main DITA Map in the DITA Maps Manager and use the "Validate and check for completeness" action? It shows more problems than just validating each topic individually...
We don't have a main map and some files may be orphaned. I suppose I could have listed all our DITA files and made a master map and checked that.
Radu wrote:We also do not have a good way to know how much this will take. I think that at some point we counted all files and encountered a situation in which someone linked their entire harddrive in the project. So counting all the files lasted a long time... but I understand where you stand and I will add an internal issue, maybe we can better report how long this will take.
makes sense now, although the check would have to eventually go though all the files and folders anyway right?
Radu wrote: I will add an internal issue to test if our batch validation properly releases memory in the end. So these were mostly DITA topics/maps/concepts and so on right? Were they DTD, XML Schema, or RNG based?
Yeah, just topics and maps. i think it properly released the memory and everything, the memory usage indicator in the tray was normal. It just seemed to be the "save as text/xml" routines that were struggling with over 200k results the first time around (mainly due to filename case issue)
Radu wrote:I see below that you managed to get the filter API working. Could you tell us about those particular issues that Oxygen should not report anymore? Maybe we can avoid reporting them on our side if they are nonsense...
Most frequent issue is "Element is not a content reference but has attribute "(cols|status)" with value "-dita-use-conref-target"." Source code DITA is like the following:

Code: Select all

<table conref="file2.dita#ref/table_abc"> 
<title></title>
<tgroup cols="-dita-use-conref-target">
....
I supose we could just hardcode the col number but not sure what would happen if the source table co num changed.

Thanks again.
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: Batch validation minor annoyances

Post by Radu »

Hi,

So:
We don't have a main map and some files may be orphaned. I suppose I could have listed all our DITA files and made a master map and checked that.
The "Validate and check for completeness" done on a DITA Map does a much faster validation of all the topics referenced in the map (or submaps) than batch validating each topic.
This happens, because when you batch validate topics, each topic is validated with its associated validation scenario which also includes some Schematron checks and Schematron checks are quite consuming. When you use the "Validate and check for completeness" you can also specify a Schematron file to use for validation, but by default validation is done only against the DTD/Schema + by checking various other rules.
Actually that's a list of what the Validate and check for completeness does:

http://blog.oxygenxml.com/2015/12/dita- ... k-for.html

and it's very fast, with a fast computer in 20 seconds it can validate about 1000-2000 topics and maps.
Indeed it does not validate orphan resources. But I guess you don't really care much about those anyway.
Yeah, just topics and maps. i think it properly released the memory and everything, the memory usage indicator in the tray was normal. It just seemed to be the "save as text/xml" routines that were struggling with over 200k results the first time around (mainly due to filename case issue)
I think that the out of memory occurred more in your case because of the many errors that were reported. But we will do more tests on our side.
Most frequent issue is "Element is not a content reference but has attribute "(cols|status)" with value "-dita-use-conref-target"."
The specs:

[url]http://docs.oasis-open.org/dita/v1.2/os ... argetvalue[/quote]

seems to state that this special "-dita-use-conref-target" value can be used only on the element which has the @conref attribute so I think that Oxygen is correct in signaling this as a problem (although the problem is benign).
I supose we could just hardcode the col number but not sure what would happen if the source table co num changed.
If you use Oxygen's "Reuse Content" action to insert a conref to a table, Oxygen generates this DITA content:

Code: Select all

<table conref="#introduction/tableID">
<tgroup cols="1">
<tbody>
<row>
<entry/>
</row>
</tbody>
</tgroup>
</table>
So indeed Oxygen hard codes the value "1" for the @cols attribute. That "tgroup" basically needs to be added there because it is required by the DTDs, but it will not be used for anything as long as the @conref resolves to a proper table. So it is just a fallback element in case the conref does not resolve and it is also required by validation.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: Batch validation minor annoyances

Post by Radu »

Hi,

I had some more time to look into this:
I added a ValidationProblemsFilter to the map editor to exclude a few messages. It correctly filters the probables, however the problem view panel automatically opens up and is empty when all messages are filtered.
So you are using Oxygen 18.1, right?
In the Oxygen Help menu->About dialog there is also a "Build ID" value, something with a pattern like "yyyymmddhh". Could you tell me what that value is on your side?
I constructed a validation problems filter which removes all reported problems but I cannot reproduce this problem on my side, could you give me some Java sample code with what your plugin does?

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
sanGeoff
Posts: 42
Joined: Mon Aug 18, 2014 11:50 pm

Re: Batch validation minor annoyances

Post by sanGeoff »

Radu wrote:So you are using Oxygen 18.1, right?
In the Oxygen Help menu->About dialog there is also a "Build ID" value, something with a pattern like "yyyymmddhh". Could you tell me what that value is on your side?
I constructed a validation problems filter which removes all reported problems but I cannot reproduce this problem on my side, could you give me some Java sample code with what your plugin does?
Sure, I think I mentioned that info in my first post, is this what you are looking for?
Platform: Standalone - Windows 32-bit
Version : <oXygen/> XML Author 18.1, build 2017020917
Java Ver: Java SE 8u102

It's very minor thing, I wouldn't spend a lot of time on it. It looks like I can prevent the panel popup by turning off automatic DITA validation for DITA maps.

Here is an example map:

Code: Select all

<?xml version="1.0"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<map>
<topicref href="example.xml#ref/section_2"/>
</map>
In my map validation filter i have:

Code: Select all

...
if(msg.indexOf("Topic references should only be made to topic IDs") > -1)
iterator.remove();
We have many legacy element references in large topics that work fine in our outputs. The empty problem pane seems to only automatically popup when the automatic DITA validation for the DITA map document type validation is checked.
Radu
Posts: 9055
Joined: Fri Jul 09, 2004 5:18 pm

Re: Batch validation minor annoyances

Post by Radu »

Hi,

Does this problem with the empty problems list occur when the DITA Map is opened in the DITA Maps Manager view or in the main editor?
If possible could you post your entire code from the implementation of "ValidationProblemsFilter.filterValidationProblems(ValidationProblems)"?

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply