Deleting everything except cross-reference structure from a book

Post by **chrispitude** » Wed Nov 03, 2021 6:31 pm

Hi everyone,

I have a large book with a lot of content and many cross-references. To submit a testcase to Syncro Soft related to cross-reference behavior, I needed a way to remove the content but keep the cross-references.

One way to do this is to use Oxygen's built in Elements > Delete element refactoring operation, and use this for the element pattern:

Code: Select all

*[not(descendant-or-self::xref or descendant-or-self::link)][not(self::topic or self::entry or self::colspec)]

This expression works as follows:

It deletes all elements except those that have an <xref> or <link> somewhere in their hierarchy.
It leaves empty <topic> elements to avoid broken cross-reference targets.
It leaves text surrounding the <xref> or <link> elements in place, but you can then obfuscate that by running Help > Support Tools > Randomize XML text content.

Note that you *must* run this only on .dita files! If you include .ditamap files in the refactoring operation, all the map content will be deleted.

This greatly reduced the size of my cross-reference testcase.

The element pattern above does not keep empty <fig> or <table> elements around to satisfy non-topic cross-references, but you can also keep those elements as needed by adding them to the [not(self::*)] filter:

Code: Select all

*[not(descendant-or-self::xref or descendant-or-self::link)][not(self::topic or self::entry or self::colspec or self::fig or self::table)]

Post by **Radu** » Thu Nov 04, 2021 2:36 pm

Hi Chris,

Maybe applying the "Randomize XML text content" directly would have been enough.
Or change its stylesheet "OXYGEN_INSTALL_DIR/refactoring/randomizeContent.xsl" to generate less random text if you wanted to reduce the file contents.

Regards,
Radu

Post by **chrispitude** » Mon Nov 08, 2021 5:28 am

Hi Radu,

I agree that "Randomize XML text content" is probably enough for most cases.

In this case, I was creating a testcase for a "DITA References" add-on enhancement request, and so my focus was on how the cross-references behaved. So for this, it was quite useful to delete all of the content in the book except the content containing cross-references, so that it was easier for someone unfamiliar to see the references.

In the process, I found another interesting use for this content-deletion trick. When I published a PDF from the reduced testcase, I got the book with just the "skeleton" of all the sections that cross-referenced each other. It was interesting to page through the PDF and see which chapters tended to refer to which other chapters, and the commonality/consistency of how the references were phrased!

I doubt this trick would be used often, but I wanted to share it in case someone found it interesting to try.

Deleting everything except cross-reference structure from a book

Deleting everything except cross-reference structure from a book

Re: Deleting everything except cross-reference structure from a book

Re: Deleting everything except cross-reference structure from a book