"Find Unreferenced Resources" could support DITA-OT project files

Are you missing a feature? Request its implementation here.
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

"Find Unreferenced Resources" could support DITA-OT project files

Post by chrispitude »

As writers update and restructure their content, we accumulate unused topics, maps, images, and even DITAVAL files.

Oxygen has a well-implemented Find Unreferenced Resources feature. However, it is only invokable from the DITA Maps Manager. To run it on multiple input maps, you must open one of the maps, then manually add the additional maps.

In addition, only maps are supported as input. We have many (100+) maps. Some have similar file names and it is not always clear which are production maps and which are temporary/abandoned/unused maps.

Our DITA-OT project files are the final arbiter of what we publish. It would be useful to be able to multi-select our deliverable files:

image.png
image.png (23.52 KiB) Viewed 575 times

then right-click and choose Find Unreferenced Resources from that context menu.

In the following testcase:

oxygen_find_unused_resources.zip
(31.51 KiB) Downloaded 118 times

if I selected the deliverable files shown above, then searched for unused resources in the entire dita/ directory, I would expect to detect the following unused files:

Code: Select all

dita/bookA/UNUSED.jpg
dita/bookB/UNUSED.png
dita/bookC/UNUSED.dita
dita/UNUSED/topic1.dita
dita/UNUSED/topic2.dita
dita/UNUSED.ditamap
dita/_common/resources/UNUSED.dita
dita/_common/_ditaval/UNUSED.ditaval
dita/_project/UNUSED.xml
dita/_warehouse/_ditaval/UNUSED.ditaval
dita/_warehouse/_products/UNUSED.dita
dita/_warehouse/_products/UNUSED.ditamap
It would be a bonus if the search could recurse into Oxygen publishing templates to also detect the following unused files:

Code: Select all

dita/_common/_template/css/UNUSED.css
dita/_common/_template/fonts/UNUSED.ttf
dita/_common/_template/html-fragments/UNUSED.xhtml
dita/_common/_template/xsl/UNUSED.xsl
but that is just bonus points. :)
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: "Find Unreferenced Resources" could support DITA-OT project files

Post by Radu »

Hi Chris,

Thanks for the feature request, I added two issues:

EXM-52097 Find Unreferenced Resources - invoke on DITA OT Project files
EXM-52098 Find Unreferenced Resources - Report unused publishing template files

For the second issue, it may be more problematic, you may have CSS files importing each other, HTML files which refer to javascript and CSS files, so we would need to parse multiple file types to find references to other files.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 907
Joined: Thu May 02, 2019 2:32 pm

Re: "Find Unreferenced Resources" could support DITA-OT project files

Post by chrispitude »

Thanks Radu!

For the first enhancement, currently I grep all the .ditamap and .ditaval files out of the context and deliverable files, then create a dummy all.ditamap file to use with Find Unreferenced Resources. But this is not something my writers know how to do, and it is not strictly correct. (For example, what if there was an unused context file that no deliverable file used?)

The support for publishing template files is definitely less important. I have a Perl script that implements both enhancements:

Code: Select all

get_referenced_files.pl dita/_project/deliverables-*
but It requires hacky comments in some of the files to handle things that are not easily traceable:

Code: Select all

$ rg '!-- href'
dita/_project/publications.xml
14:    <!-- href="../_common/_template/synopsys-pdf.opt" -->
26:    <!-- href="../_common/_template/synopsys-pdf.opt" -->
38:    <!-- href="../_common/_template/synopsys-webhelp.opt" -->
50:    <!-- href="../_common/_template/synopsys-webhelp.opt" -->

dita/_common/_template/synopsys-webhelp.opt
18:            <!-- href="img/favicon.ico" -->
Comments should not normally be processed for dependency analysis, but my script uses regex and is too dumb to know that. :) After running this script, I can diff its output against a flat list of all files:

Code: Select all

$ find dita -type f | sort > all.txt
$ get_referenced_files.pl dita/_project/deliverables-* > used.txt
$ diff all.txt used.txt | grep '<'
< dita/UNUSED.ditamap
< dita/UNUSED/topic1.dita
< dita/UNUSED/topic2.dita
< dita/_common/_ditaval/UNUSED.ditaval
< dita/_common/_template/css/UNUSED.css
< dita/_common/_template/fonts/UNUSED.ttf
< dita/_common/_template/html-fragments/UNUSED.xhtml
< dita/_common/_template/xsl/UNUSED.xsl
< dita/_common/resources/UNUSED.dita
< dita/_common/resources/copyright.dita
< dita/_project/UNUSED.xml
< dita/_warehouse/_ditaval/UNUSED.ditaval
< dita/_warehouse/_products/UNUSED.dita
< dita/_warehouse/_products/UNUSED.ditamap
< dita/bookA/UNUSED.jpg
< dita/bookB/UNUSED.png
< dita/bookC/UNUSED.dita
Here's the script, if you are interested:

get_referenced_files.zip
(1.7 KiB) Downloaded 114 times

The script is useful for other things. For example, I can create an archive for all the files needed for certain map or deliverables files:

Code: Select all

./get_referenced_files.pl dita/olh_*.ditamap | tar cvfz olh_files.tgz --files-from -
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: "Find Unreferenced Resources" could support DITA-OT project files

Post by Radu »

Hi Chris,
Thanks for posting this, maybe others having the same use case will find it useful.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
Post Reply