Page 1 of 1

Handling file/directory rename reference adjustment in large projects

Posted: Sun Sep 13, 2020 5:44 pm
by chrispitude
We have ~100 DITA bookmap files that reference ~20,000 DITA topic files. (This is just in one Git repo, and we have multiple repos.) Our writers are asking how to rename files and directories and have affected references updated automatically.

In addition to books, we have release notes for all products, published every six weeks for each product. The release notes bookmaps share common topic files within product families. Both books and release notes share common company-wide template topics. In addition, some topics are reused across books and release notes with conditionalization.

The takeaway from all this is that if a file or directory is renamed, it could affect one or more books, one or more release notes, or some combination of all of it.

If I add all ~100 books (but not release notes) to Master Files, reference adjustment after a file/directory rename takes an average of 80 seconds, which is not practical.

In addition, master files consume entries in the DITA Maps Manager context, which is quite problematic. We have issue EXM-45202 filed for this.

Next I tried working sets. The documentation on working sets is brief and left me with questions:
  • Does the "working sets" feature require Master Files to be enabled?
  • Does the "working sets" feature require the set contents to be a subset of the Master Files list?
  • Where is the working set configuration stored? (I don't see it in the .xpr file.)
I am not sure working sets are a viable solution anyway, because our book and release note dependencies do not partition the content into independent sets. Forcing a writer to choose a set will inevitably result in missed adjustments.

I noticed that the Resource Hierarchy/Dependencies feature has similarly long runtime to renaming via Master Files, so perhaps they both use a similar file-scanning approach. Interestingly, both complete quickly for unreferenced files *not* in any maps - does this mean map-to-topic references are cached, but topic-to-topic references are not?

With release notes maps being created for 50-ish products every six weeks, any solution that requires maps to be manually added or enumerated in a list is suboptimal.

Can Oxygen have a feature that comprehends all .ditamap files in the project and efficiently handles renaming reference adjustment (and dependency analysis?) without manually being fed information? (I know this is not easy, or it would already be done!)

A random thought to finish up:

I see that Oxygen already indexes content via Apache Lucene. Are href values included in the indexing? Maybe there is some clever trick to perform fast leaf filename or directory name matches (just "topic.dita" or "dirname") that can then invoke more expensive content-aware analysis to see if there is a dependency?

Re: Handling file/directory rename reference adjustment in large projects

Posted: Mon Sep 14, 2020 9:24 am
by Radu
Hi Chris,

Please see some comments below:
If I add all ~100 books (but not release notes) to Master Files, reference adjustment after a file/directory rename takes an average of 80 seconds, which is not practical.
Right, we have an internal issue for this "EXM-43529 Cache module dependencies when renaming/moving DITA resources". If we make progress on it in a future version I will update this forum thread. I looked a but into it some time ago but I did not find a fast solution for that, at the same time I did not write the code for the references updater so my understanding of the entire thing is not very good.
Also, do you rename topics from the Project view or from the DITA Maps Manager view? Right now these are separate implementations that we probably need to bring together, they should take just as much time I think because both go through all resources starting from the master files and root map.
In addition, master files consume entries in the DITA Maps Manager context, which is quite problematic. We have issue EXM-45202 filed for this.
Yes, good news on that, my colleague Cosmin just fixed the issue, if it goes through all out internal QA and review stages we might have for you a beta kit to test in 1-2 weeks.

The working sets from what I know are not used at all in this rename feature. Also as you said if you define a more restrictive working set, you end up not updating some references.
I noticed that the Resource Hierarchy/Dependencies feature has similarly long runtime to renaming via Master Files, so perhaps they both use a similar file-scanning approach.
They use exactly the same mechanism.
Interestingly, both complete quickly for unreferenced files *not* in any maps - does this mean map-to-topic references are cached, but topic-to-topic references are not?
We seem to have a graph which already knows from what places a topic is referenced, that graph is updated when content is saved in Oxygen, somehow that graph is used to know which top level maps point to that topic but we still navigate again the entire hierarchy of topics and maps when the topic is actually renamed. I think one reason for this is that we consider that our graph may not 100% reflect reality as people may make changes to content in external applications. And we do not want to make mistakes when renaming content. Anyway, we'll find out more when we actually start looking more into this.
Can Oxygen have a feature that comprehends all .ditamap files in the project and efficiently handles renaming reference adjustment (and dependency analysis?) without manually being fed information?
We understand the necessity for this, I think we'll have that in a future version and I'll update this forum thread when we do, but probably not in 23.

Regards,
Radu

Re: Handling file/directory rename reference adjustment in large projects

Posted: Mon Sep 14, 2020 2:22 pm
by chrispitude
Radu wrote: Mon Sep 14, 2020 9:24 am
In addition, master files consume entries in the DITA Maps Manager context, which is quite problematic. We have issue EXM-45202 filed for this.
Yes, good news on that, my colleague Cosmin just fixed the issue, if it goes through all out internal QA and review stages we might have for you a beta kit to test in 1-2 weeks.
This is great news!! I look forward to testing it. :)
Radu wrote: Mon Sep 14, 2020 9:24 amWe seem to have a graph which already knows from what places a topic is referenced, that graph is updated when content is saved in Oxygen, somehow that graph is used to know which top level maps point to that topic but we still navigate again the entire hierarchy of topics and maps when the topic is actually renamed. I think one reason for this is that we consider that our graph may not 100% reflect reality as people may make changes to content in external applications. And we do not want to make mistakes when renaming content. Anyway, we'll find out more when we actually start looking more into this.
I'm glad you are paranoid about the outside world making changes, as this absolutely happens. Sometimes we switch between a text editor and Oxygen to do certain tasks, and I am thankful that Oxygen/Git both handle this as well as it does!

Maybe we could use filesystem timestamps to detect when cache entries cannot be trusted? I could see some recursion here - if a submap changed that needs to be refreshed, then the submap could pull new files into the dependency tree that themselves need to be comprehended. But I think with the right code, it's a solvable problem.

Does your current scanner parse all files in XML form, or does it use search or grep-like tricks to quickly decide if it can skip over the file?

Re: Handling file/directory rename reference adjustment in large projects

Posted: Tue Sep 15, 2020 8:03 am
by Radu
Hi Chris,

Thanks for the advice, I added it on the opened issue. We scan each XML file in an XML-aware fashion, with an XML parser, caching the referenced schemas so that they are not accessed each time an XML document is parsed. The parsing is fast but of course it gets multiplied with the number of files that are parsed, so indeed we need to avoid parsing all files when a rename is done and focus on parsing only the files which have some relationship with the changed resources.

Regards,
Radu

Re: Handling file/directory rename reference adjustment in large projects

Posted: Mon Nov 30, 2020 3:18 am
by chrispitude
Hi Radu,

Lately when I need to rename some files, I add just that map to Master Files, do the rename work, then remove the map from Master Files. There doesn't seem to be any preprocessing penalty for doing this - the rename operations immediately assess the context and correctly figure out the dependencies.

This got me to thinking - how about adding an "Include open map files" to Master Files? Any open map files would implicitly be included. This way, to rename files in a book, all you need to do is open the book before moving or renaming the files. This would (mostly?) solve the large project issue by construction.

What do you think?

Re: Handling file/directory rename reference adjustment in large projects

Posted: Wed Dec 02, 2020 10:30 am
by Radu
Hi Chris,

How about if you open that map in the DITA Maps Manager view, right click on the topicref there and use "Refactoring->Rename resource..."?
The scope of that rename operation should be the current selected DITA Map in the DITA Maps Manager view plus all DITA Maps added to the "Master Files" folder.

Regards,
Radu

Re: Handling file/directory rename reference adjustment in large projects

Posted: Wed Dec 02, 2020 7:28 pm
by chrispitude
Hi Radu,

That is helpful, I had not tried that!

Unfortunately this will help only in a small number of cases. Most of the time, the writers are restructuring topic/image files within their books (adding directory structure as subject areas get large and complicated), renaming directories within their books that affect references to the contents within, or moving topics/images from book directories to warehouse directories.

Re: Handling file/directory rename reference adjustment in large projects

Posted: Thu Dec 03, 2020 9:28 am
by Radu
Hi Chris,

Makes sense, I added an internal issue for this:

EXM-46890 Add root map to master files scope when renaming/moving project resources

We'll update this thread if we implement it in a future version.

Regards,
Radu

Re: Handling file/directory rename reference adjustment in large projects

Posted: Sat Dec 05, 2020 6:26 pm
by chrispitude
Thanks Radu!

Re: Handling file/directory rename reference adjustment in large projects

Posted: Tue Oct 19, 2021 8:48 am
by Radu
Hi,

As an update, we released Oxygen 24 which should be much faster to adjust references when single topics are renamed or move in the Project view.

Regards,
Radu