Deleting everything except cross-reference structure from a book

Post here questions and problems related to editing and publishing DITA content.
chrispitude
Posts: 922
Joined: Thu May 02, 2019 2:32 pm

Deleting everything except cross-reference structure from a book

Post by chrispitude »

Hi everyone,

I have a large book with a lot of content and many cross-references. To submit a testcase to Syncro Soft related to cross-reference behavior, I needed a way to remove the content but keep the cross-references.

One way to do this is to use Oxygen's built in Elements > Delete element refactoring operation, and use this for the element pattern:

Code: Select all

*[not(descendant-or-self::xref or descendant-or-self::link)][not(self::topic or self::entry or self::colspec)]
This expression works as follows:
  • It deletes all elements except those that have an <xref> or <link> somewhere in their hierarchy.
  • It leaves empty <topic> elements to avoid broken cross-reference targets.
  • It leaves text surrounding the <xref> or <link> elements in place, but you can then obfuscate that by running Help > Support Tools > Randomize XML text content.
Note that you *must* run this only on .dita files! If you include .ditamap files in the refactoring operation, all the map content will be deleted.

This greatly reduced the size of my cross-reference testcase.

The element pattern above does not keep empty <fig> or <table> elements around to satisfy non-topic cross-references, but you can also keep those elements as needed by adding them to the [not(self::*)] filter:

Code: Select all

*[not(descendant-or-self::xref or descendant-or-self::link)][not(self::topic or self::entry or self::colspec or self::fig or self::table)]
Radu
Posts: 9433
Joined: Fri Jul 09, 2004 5:18 pm

Re: Deleting everything except cross-reference structure from a book

Post by Radu »

Hi Chris,

Maybe applying the "Randomize XML text content" directly would have been enough.
Or change its stylesheet "OXYGEN_INSTALL_DIR/refactoring/randomizeContent.xsl" to generate less random text if you wanted to reduce the file contents.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
chrispitude
Posts: 922
Joined: Thu May 02, 2019 2:32 pm

Re: Deleting everything except cross-reference structure from a book

Post by chrispitude »

Hi Radu,

I agree that "Randomize XML text content" is probably enough for most cases.

In this case, I was creating a testcase for a "DITA References" add-on enhancement request, and so my focus was on how the cross-references behaved. So for this, it was quite useful to delete all of the content in the book except the content containing cross-references, so that it was easier for someone unfamiliar to see the references.

In the process, I found another interesting use for this content-deletion trick. When I published a PDF from the reduced testcase, I got the book with just the "skeleton" of all the sections that cross-referenced each other. It was interesting to page through the PDF and see which chapters tended to refer to which other chapters, and the commonality/consistency of how the references were phrased!

I doubt this trick would be used often, but I wanted to share it in case someone found it interesting to try.
Post Reply