Page 1 of 1

Getting a handle on <bookmap> CSS selectors

Posted: Fri May 10, 2019 5:29 pm
by chrispitude
Hi everyone,

Some of our books use <part> volume dividers in the bookmaps and some do not. It's been a bit challenging to write CSS selectors that work in both cases.

To more easily see what attributes are available, I wrote a little perl script called show_html_structure.pl that processes the temporary HTML file and shows a summary of the structure:

Code: Select all

#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;

my $twig = XML::Twig->new->parsefile_html(shift);
$twig->root->first_child('head')->delete;
$_->cut_children for $twig->get_xpath('//div[@class = "- topic/body body"]');  # empty topic body contents
$_->set_att('HERE', 1) for $twig->get_xpath('//div[@class =~ /glossgroup/]');  # delete glossary entries (subtopics)
$_->cut_children('article') for $twig->get_xpath('//article[@class =~ /glossgroup/]');  # delete glossary entries (subtopics)
$_->delete for $twig->descendants('h1|h2|h3|h4|h5|h6|h7|h8|h9');  # delete titles
$_->delete for $twig->root->get_xpath('//div[@class =~ /wh_related_links/]');
$_->delete for $twig->root->get_xpath('//div[@class =~ /topic.body /]');  # delete empty topic body placeholders
$twig->root->strip_att($_) for qw(id nd:nd-id oid xmlns:nd aria-labelledby break-before cascade ditaarch:ditaarchversion lang xml:lang xmlns:ditaarch);  # delete attributes of non-interest
$twig->print(pretty_print => 'indented');
Before you use the script, you must install the XML::Twig and HTML::TreeBuilder perl CPAN modules, which you can do with:

Code: Select all

sudo cpan -i App::cpanminus  ;# if cpanm is not installed
sudo cpanm install XML::Twig HTML::TreeBuilder
Use the script by running it on the merged HTML file in the out/pdf-css-html5/ directory. For example, a test book with the following ditamap:

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="urn:oasis:names:tc:dita:rng:bookmap.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>
<bookmap>
  <title>Sample Book</title>
  <frontmatter>
    <notices href="notices.dita" id="notices"/>
    <preface href="preface.dita" id="preface"/>
  </frontmatter>
  <part id="partdiv_id7" navtitle="Part 1">
    <chapter href="chapter1.dita" id="chapter1">
      <topicref href="topic1.dita" id="topic1"/>
    </chapter>
  </part>
  <part id="partdiv_id9" navtitle="Part 2">
    <chapter href="chapter2.dita" id="chapter2">
      <topicref href="topic2.dita" id="topic2"/>
    </chapter>
  </part>
  <appendix href="appendix.dita" id="appendix"/>
  <appendix href="glossary.dita" id="glossary"/>
</bookmap>
results in this output:

Code: Select all

$ show_html_structure.pl OPENME.merged.html
<html xtrf="file:/C:/Users/...deleted.../OPENME.ditamap">
  <body class="wh_topic_page">
    <div class="wh_content_area">
      <div class="wh_topic_body">
        <div class="wh_topic_content">
          <div class="- map/map bookmap/bookmap map bookmap" domains="(map bookmap) (topic abbrev-d) (topic delay-d) (map ditavalref-d) (topic hazard-d) (topic hi-d) (topic indexing-d) (map mapgroup-d) (topic markup-d xml-d) (topic marku
p-d) (topic pr-d) (topic relmgmt-d) (topic sw-d) (topic ui-d) (topic ut-d) (topic xnal-d) a(props deliveryTarget)">
            <div class="- front-page/front-page front-page">
              <div class="- front-page/front-page-title front-page-title">
                <div class="- topic/title title">Sample Book</div>
              </div>
            </div>
            <article class="- topic/topic topic nested0" topicrefclass="- map/topicref bookmap/notices "></article>
            <article class="- topic/topic topic nested0" topicrefclass="- map/topicref bookmap/preface "></article>
            <article class="+ topic/topic pdf2-d/placeholder topic placeholder nested0" is-chapter="true" is-part="true">
              <article class="- topic/topic topic nested1" is-chapter="true" topicrefclass="- map/topicref bookmap/chapter ">
                <article class="- topic/topic topic nested2" topicrefclass="- map/topicref "></article>
              </article>
            </article>
            <article class="+ topic/topic pdf2-d/placeholder topic placeholder nested0" is-chapter="true" is-part="true">
              <article class="- topic/topic topic nested1" is-chapter="true" topicrefclass="- map/topicref bookmap/chapter ">
                <article class="- topic/topic topic nested2" topicrefclass="- map/topicref "></article>
              </article>
            </article>
            <article class="- topic/topic topic nested0" is-chapter="true" topicrefclass="- map/topicref bookmap/appendix "></article>
            <article class="- topic/topic concept/concept glossgroup/glossgroup topic concept glossgroup nested0" is-chapter="true" topicrefclass="- map/topicref bookmap/appendix "></article>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>
You can comment or uncomment lines in the perl script to show or hide various things. For example, you could keep titles so you can see what classes they can be selected with:

Code: Select all

$ show_html_structure.pl OPENME.merged.html
<html xtrf="file:/C:/Users/...deleted.../OPENME.ditamap">
  <body class="wh_topic_page">
    <div class="wh_content_area">
      <div class="wh_topic_body">
        <div class="wh_topic_content">
          <div class="- map/map bookmap/bookmap map bookmap" domains="(map bookmap) (topic abbrev-d) (topic delay-d) (map ditavalref-d) (topic hazard-d) (topic hi-d) (topic indexing-d) (map mapgroup-d) (topic markup-d xml-d) (topic marku
p-d) (topic pr-d) (topic relmgmt-d) (topic sw-d) (topic ui-d) (topic ut-d) (topic xnal-d) a(props deliveryTarget)">
            <div class="- front-page/front-page front-page">
              <div class="- front-page/front-page-title front-page-title">
                <div class="- topic/title title">Sample Book</div>
              </div>
            </div>
            <article class="- topic/topic topic nested0" topicrefclass="- map/topicref bookmap/notices ">
              <h1 class="- topic/title title topictitle1">Notices</h1>
            </article>
            <article class="- topic/topic topic nested0" topicrefclass="- map/topicref bookmap/preface ">
              <h1 class="- topic/title title topictitle1">Preface</h1>
            </article>
            <article class="+ topic/topic pdf2-d/placeholder topic placeholder nested0" is-chapter="true" is-part="true">
              <h1 class="- topic/title title topictitle1">Part 1</h1>
              <article class="- topic/topic topic nested1" is-chapter="true" topicrefclass="- map/topicref bookmap/chapter ">
                <h2 class="- topic/title title topictitle2">Chapter</h2>
                <article class="- topic/topic topic nested2" topicrefclass="- map/topicref ">
                  <h3 class="- topic/title title topictitle3">Topic</h3>
                </article>
              </article>
            </article>
            <article class="+ topic/topic pdf2-d/placeholder topic placeholder nested0" is-chapter="true" is-part="true">
              <h1 class="- topic/title title topictitle1">Part 2</h1>
              <article class="- topic/topic topic nested1" is-chapter="true" topicrefclass="- map/topicref bookmap/chapter ">
                <h2 class="- topic/title title topictitle2">Chapter</h2>
                <article class="- topic/topic topic nested2" topicrefclass="- map/topicref ">
                  <h3 class="- topic/title title topictitle3">Topic</h3>
                </article>
              </article>
            </article>
            <article class="- topic/topic topic nested0" is-chapter="true" topicrefclass="- map/topicref bookmap/appendix ">
              <h1 class="- topic/title title topictitle1">Appendix</h1>
            </article>
            <article class="- topic/topic concept/concept glossgroup/glossgroup topic concept glossgroup nested0" is-chapter="true" topicrefclass="- map/topicref bookmap/appendix ">
              <h1 class="- topic/title title topictitle1">Glossary</h1>
            </article>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>
Note that I'm using a beta 21.1 build of PDF Chemistry provided to resolve some issues. It defines a new @topicrefclass attribute on articles that preserves the topicref context of articles that come from <bookmap> constructs.

Re: Getting a handle on <bookmap> CSS selectors

Posted: Mon May 13, 2019 3:30 pm
by Dan
Hello Chris,
Thank you for sharing your scripts!
Many regards,
Dan