Example Tutorial

A simple tutorial for publishing HTML content to PDF with a cover page, bookmarks, etc.

This tutorial will show you how to format a book. For simplicity, the tutorial will use an HTML version of the The Adventures of Tom Sawyer by Mark Twain, without many structures. It mainly consists of titles, paragraphs, and some meta-information. You can find the free ebook that we used for this tutorial at: https://www.gutenberg.org (search for The Adventures of Tom Sawyer).

Before getting started, save this file with the name "book.html" then create a CSS file named "book.css" in the same directory. Note that for the purposes of this tutorial, it is assumed that you are using Oxygen XML Editor/Author for your XML IDE.

1. Cleanup the existing styles

To make things easier, remove the <style> element from the header of the book. This will prevent mixing CSS rules coming from your CSS with the ones that were created for the browser display.

2. Setup Chemistry in Oxygen

To transform this book to PDF, configure Oxygen PDF Chemistry as an external tool in oXygen. Go to Tools > External Tools > Configure menu, click the New button, and configure it as a new external tool. Set the Command line to:
cmd /c "${oxygenInstallDir}\oxygenChemistry.bat" -catalogs ${xmlCatalogFilesList}
   -in "${cf}" -css "${cfd}/${cfn}.css" -out "${cfd}/${cfn}.pdf" -show-pdf 

Result: Every time you select the HTML book file in Oxygen, you can use this external tool from the toolbar.

3. Define the page size

To accommodate printing this book in a format similar to the original edition of the book, add the following to your CSS:
@page {
  size: 6in 7.5in;
  margin: 0.5in;
}

4. Select fonts

Good novel books usually have clean serif fonts. You can choose one from Google Fonts by adding the following import at the beginning of the CSS file:

@import url('https://fonts.googleapis.com/css?family=Crimson+Text');

Then, set it on the root of the document:

:root {
  font-family: 'Crimson Text', serif;
  font-size:11pt;
}

Besides the main content, the HTML book contains some descriptions and licensing terms, located in <pre> elements, directly under the <body> element. For these, to wrap the content (the default is not to wrap), use a smaller font:

body > pre {
  font-family: sans-serif;
  font-size:7pt;
  white-space: pre-wrap;
}

5. Transform the book with Chemistry

Try to transform the HTML book with Chemistry. You will get something similar to this:

Notice that besides the formatting, Chemistry already helped with the following:
  • Detected the publication title and set it as PDF. See the window title-bar.
  • Created a tree of bookmarks by taking the <H2> elements into account.
For the purposes of this tutorial, the following still needs to be addressed:
  • The preface, and each of the chapters should start on a new page.
  • The chapter titles need to be formatted.
  • The publication needs page numbers, page headers, and other styling.

6. Justify text

To improve the alignment of the right side of the book, justify the text by adding the following in the CSS:
p {
  text-align: justify;
}

7. Make chapters start on a new page

Currently, the document is very flat, and the chapters are just marked by the <H2> title elements between <p> elements. There are also <pre> elements used for the copyright and licensing:

<h2>
      CHAPTER III
</h2>
...
<p>
      TOM presented himself before Aunt Polly, who was sitting by an open window
      in a pleasant rearward apartment, which was bedroom, breakfast-room,

To make each of the chapters start on a new page, the CSS paged media module defines a way to forcibly break the page before an element:

h2 {
  page-break-before:always;
}

body > pre {
  page-break-before:always;
}

Result: Now all the chapters start at the beginning of a new page, and the book is starting to look like a real publication. If you want the chapters to always start on a page from the right side, use the right value for the property.

8. Format the chapter titles

Currently, the titles look rather dull, aligned to the left. Center them and give them styling:
h2 {
  text-align: center;
  font-size:larger;
}

9. Add page numbers

Most novels have the page numbers shown in the bottom center of the page. To achieve this, you can use the page CSS counter set in a page margin box:

@page {
  @bottom-center {
    content: "-" counter(page) "-";
    font-size: 8pt;
  }
}

What the document looks like so far

So, you have solved the text justification, page breaks, and page numbers. This is what the document looks like now:

Remaining things to be addressed:
  • It needs a cover page.
  • The page numbering should restart on the first chapter, after the preface, and should end before the licensing terms.

10. Add a cover page

You can find the original cover of the book on the same website of the Gutenberg project. For this tutorial, use this as artwork for the first page.

Start by defining a named page in your CSS file, with no page counter in the bottom-center region:

@page cover-page{
   background-image:url('https://www.gutenberg.org/files/74/74-h/images/bookcover.jpg');  
   background-size: 6in 7.5in;
   background-repeat:no-repeat;

   @bottom-center {
      content:none;
   }   
}

When using images for your cover pages, make sure they respect the same aspect ratio as your page (width/height ratio), then use the background-size property to stretch it exactly to the page size.

Next, link this page to a synthetic element placed before the root. You can use a :before pseudo element in the <html> root element:

html:before{
  content:" ";
  page:cover-page;
}

You could place text over the cover image, but for the moment just leave the content with blank text (a whitespace). It is necessary to have a content property that is not empty because Oxygen PDF Chemistry discards all the pseudo elements without one.

11. Reset and style page numbers

To make the page numbers be restarted at the beginning of the first chapter, you can use the first title that follows the metadata at the beginning of the document:

pre + h2{
  counter-reset: page 1;
} 

The licensing terms at the end of the book can be numbered independently, and styled differently:

body > pre {
  counter-reset:page;
  page:copyright-license-page;
}

For the copyright page, use a lower-roman numbering style:

@page copyright-license-page{
  background-color:yellow;
  @bottom-center {
    content:counter(page, lower-roman);
  }
}

Final Result: What we obtained

Now you have a nice looking book that can be distributed electronically, or printed: