PDF Output

You may have specific requirements for the PDF files you need to produce (such as the set of metadata, bookmarks, the level of accessibility, or the PDF format).

Bookmarks

PDF bookmarks provide an additional way of navigating, similar to a table of contents. The tree bookmark structure is intended to be used by the PDF readers, usually displayed in a side view. More often, the bookmarks show the logical hierarchy of the book, with pointers to the chapters and section, similar to a TOC. Creating bookmarks has no effect on the printed material.

Oxygen PDF Chemistry can create PDF bookmarks by using the standard CSS properties: bookmark-level, bookmark-label, and bookmark-state.

For an HTML document, you can collect the titles from the heading elements text.

h1, h2, h3, h4, h5, h6 {
   bookmark-label: content(text);
}

In the following example, the :before pseudo element is concatenated. That prefixes each of the h1 with the value of the chapter number, with the text from the element:

h1 { 
    bookmark-label: content(before) " / " content(text);
}

h1:before{
    content: counter(chapter);
    counter-increment:chapter;
}

You can define the level (depth in the hierarchy) of the bookmarks. The deeper the section, the higher the level:

h1 { bookmark-level: 1; }
h2 { bookmark-level: 2; }
h3 { bookmark-level: 3; }
h4 { bookmark-level: 4; }
h5 { bookmark-level: 5; }
h6 { bookmark-level: 6; }

Also, you can control if the bookmarks are shown expanded or collapsed in the bookmark view. By default, all bookmarks are open. To close all the nodes from the level 2, you can use:

h2 {
    bookmark-state:closed;
}
Note: In the built-in CSS Oxygen PDF Chemistry uses for processing HTML, the bookmarks are already configured using bookmark-level and bookmark-label. If you need to set the closed/open state, you should use the bookmark-state property in your custom CSS file.

Metadata

PDF files may contain metadata. Metadata provide additional information about a certain document, such as its title, author, organization, creation date, format, or copyright.

HTML defines the meta element for keeping track of information that describes your content. Most of this information should migrate to the PDF document properties. The property values may be either static (specified directly from the CSS) or dynamic (collected from the document) using the following functions:
  • content(text)
  • attr()
  • oxy_xpath()

Predefined Meta Fields

Examples of common metadata:
  • Publication title
  • Author
  • Keywords
  • Short description
Oxygen PDF Chemistry automatically extracts this informations from HTML documents.

Suppose that you have the following arbitrary XML document:

<doc>
     <title>Publication title</title>
     <meta name='keywords' content='software, network'>
     <meta name='description' content='This is a publication about software products...'>
     <meta name='author' content='John, jo@mysite.example.com'>
...

You could use any of the following CSS selectors to extract the metadata:

-oxy-pdf-meta-title
It is used to extract the publication title. You can use it by matching the title element:
title {
    -oxy-pdf-meta-title: content(text);
}

If this CSS selector matches multiple elements, only the first element in the document order will be used to extract the title.

-oxy-pdf-meta-author
It is used to extract the publication author. You can use it by matching the meta element with the attribute name='author':
meta[name='author'] {
    -oxy-pdf-meta-author: attr(content);
}

If this CSS selector matches multiple elements, only the first element in the document order will be used to extract the author.

-oxy-pdf-meta-description
It is used to extract the publication description. You can use it by matching the meta element with the attribute name='description' or name='description':
meta[name='description'], 
meta[name='subject'] {
    -oxy-pdf-meta-description: attr(content);
}

If this CSS selector matches multiple elements, only the first element in the document order will be used to extract the description.

-oxy-pdf-meta-keywords
It is used to extract the publication keywords. For example you can use it by matching the meta element with the attribute name='keywords':
meta[name='keywords'] {
    -oxy-pdf-meta-keywords: attr(content);
}

If this CSS selector matches multiple elements, only the first element in the document order will be used to extract the keywords.

-oxy-pdf-meta-keyword
It is used to extract a single keyword. Individual keywords are accumulated from elements that match the CSS rule that uses this property and then concatenated into a single string. This single string is then set in the PDF 'keywords' section. For example, if you mark keywords in your HTML document with a span with a "kw" class, you can collect them all by using:
span.kw {
    -oxy-pdf-meta-keyword: content(text);
}

Custom Meta Fields

Metadata is not restricted to the above cases. You may have custom metadata fields. It is usually displayed in a tabular format (for example, in Acrobat Reader ™ it is in the Custom tab in the Properties dialog box).

-oxy-pdf-meta-custom
This property has two parts, one defining the name, and one the value for the field.
In the following example, all the meta tags are dumped as custom meta fields in PDF:
meta {
    -oxy-pdf-meta-custom: attr(name) attr(content);
}
Or if somewhere in the document content, you have a span that defines the document creation date, you can use:
span.created {
    -oxy-pdf-meta-custom: "CreationDate" content(text);
} 

In case of conflicts, when two or more elements trigger setting of a meta field with the same name, only the first definition of a meta field will be used in the PDF output.

Accessibility (508 Compliance)

Make sure your PDF output is accessible for those who are blind or visually impaired. Many government organizations require the documents to be accessible.

PDF Accessibility Tagging

By default, Oxygen PDF Chemistry partially creates accessible PDF documents in the sense that most of the paragraphs, tables, lists, headers, and footers are tagged automatically for any XML vocabulary, so a PDF reader could use this information to present the content.

In addition, the default HTML CSS used by Oxygen PDF Chemistry defines accessibility tags for headings (H1..H6), quotation (Q), sections (SECTION), and pre-formatted text PRE.

However, this tagging just takes the element name into account. If your element has a different semantic, you can impose a different PDF accessibility tag by using the -oxy-pdf-tag-type. In the following example, a paragraph with the class "note" will be marked as such:

p.note {
  -oxy-pdf-tag-type: "Note";
}
Note: The headers and footers (or other text placed in the page margins) are automatically marked as artifacts, so they are ignored by the screen readers.

Hints to Make your Document More Accessible

  1. The title of the document must be marked using the metadata.
    This is important for accessibility since it will allow the screen reader to identify the publication title.
    title {
        -oxy-pdf-meta-title: content(text);
    }
    Note: The default CSS for HTML already contains this rule.
  2. Use xml:lang on the root of your document. For HTML documents use the lang attribute.
  3. Put an alternate text on all the images.

    Oxygen PDF Chemistry supports the -oxy-alt-text extension CSS property that can be used to associate the alternate text.

    The following is an example from the Oxygen PDF Chemistry default CSS for HTML, where it maps the property to the value of the alt attribute of the img tag:

    img {
       -oxy-alt-text: attr(alt);
    }

    For embedded SVG, Oxygen PDF Chemistry automatically uses the title element as alternate text of the image.

    For embedded MathML, Oxygen PDF Chemistry automatically uses the alttext attribute as alternate text of the equation.

Fully Accessible PDF (PDF/UA1)

To make the PDF fully accessible, you have to activate the PDF/UA-1 mode. PDF/UA documents meet the regulations set in Section 508. This mode has special requirements:

  1. Activate the PDF UA-1 mode from the command line, using the -pdf-ua parameter.
  2. All the fonts must be embedded. If you are using one of the basic fonts ("Times", "Helvetica",…), make sure you explicitly define CSS font faces for them. For details, see: Basic Fonts Embedding.
    Trouble: If you are using fonts other than the basic ones and still have problems embedding the basic default fonts, make sure all elements are styled using one of your fonts of choice. A catch all CSS rule might come handy:
    * {
      font-family: lato;
    }
  3. The title of the document must be marked using the metadata. This is important for accessibility since it will allow the screen reader to identify the publication title. For HTML, the default CSS contains this rule. You can use something similar for other XML vocabularies:
    title {
        -oxy-pdf-meta-title: content(text);
    }

Tools for Checking the Document Accessibility