Specialization design, customization, and the limits of specialization

DITA specialization imposes certain restrictions. An inherent challenge in designing DITA vocabulary modules and document types is understanding how to satisfy markup requirements within those restrictions and, when the requirements cannot be met by a design that fully conforms to the DITA architecture, how to create customized document types that diverge from the DITA standard as little as possible.

DITA imposes the following structural restrictions:

All topics must have titles.
Topic body content must be contained within a body element.
Section elements cannot nest.
Metadata specific to an element type must be represented using elements, not attributes.

When markup requirements cannot be met within the DITA architecture, there still might be an interest in using DITA features and technology, or a business need for interoperability with conforming DITA documents and processors. In this case, the solution is to create customized document types. Customized document types are document types that do not conform to the DITA standard. To reduce the cost of producing conforming documents from non-conforming documents, custom document types should minimize the extent to which they diverge from the DITA standard.

Typical reasons for considering custom document types include the following:

Optimizing markup for authoring
Supporting legacy markup structures that are not consistent with DITA structural rules, for example, footnotes within titles
Defining different forms of existing structures, such as lists, where the DITA-defined structures are too constrained
Providing attributes required by specific processors, such as CMS-defined attributes for maintaining management metadata
Embedding tool-imposed markup in places that do not allow the <foreign> or <unknown> elements

Remember that customized document types do not conform to the DITA standard, and thus are not considered DITA. In many of the cases above, it is possible to define document types that conform to the DITA standard. Explore this fully before developing customized document types.

Optimizing document types

Conforming DITA grammar files are modular, which facilitates exchange of vocabulary modules and constraints and simplifies the process of assembling document type shells. In some cases there might be a reason to avoid the modular approach and use an optimized document type composed of a single file (or a smaller number of files). For example, this could be advantageous in situations where validation occurs over a network.

In an optimized DTD, entities might also be resolved to further optimize processing or validation. This could speed up processing for environments that process and validate large numbers of DITA maps and topics.

An optimized document type will still allow for the creation of conforming DITA content that follows all other rules in the DITA specification. In these cases it is still possible to create a document type that conforms completely to standard DITA coding practices. Maintaining a conforming copy ensures that the optimized document type is still conforming to the standard, and might also ease interchange with tools that expect conforming document types.

Creating custom document types for non-standard markup

When the relaxed content models for DITA elements are inappropriately open for authoring purposes, document type shells can remove undesireable domains or use constraint modules to restrict content models. If content models are not relaxed enough, and markup requirements include content models that are less constrained than those defined by DITA, custom document types might be the only option.

Customized document types do not conform to the DITA standard. Preprocessing can ensure compatibility with existing publishing processes, but it does not ensure compatibility with DITA-supporting authoring tools or content management systems. However, when an implementation is being heavily customized, a customized document type can help isolate and control the consequences of non-standard design.

For example, if an authoring group requires the <p> element to be spelled out as <paragraph>, the document type could be customized to change <p> to <paragraph> for authoring purposes. Such documents then could be preprocessed to rename <paragraph> back to <p> before they are fed into a standard DITA publishing process.

Because DITA document types are designed to enable constraints, such customized documents can still take advantage of existing override schemes. While still not valid DITA, a document type shell could be set up that implements local requirements (such as adding global CMS-defined attributes), and then imports an otherwise valid document type shell. This helps isolate non-compliant portions of the document type, while reusing as much as possible of the original DITA grammar.

Specialization design considerations

Requirements for new markup often appear to be incompatible with DITA architectural rules or existing markup, especially when mapping existing non-DITA markup practice to DITA, where the existing markup might have used structures that cannot be directly expressed in DITA. For example, you might need markup for a specialized form of list where the details are not consistent with the base model for DITA lists.

In this case you have two alternatives, one that conforms to DITA and one that does not.

Specialize from more generic base elements or attributes.
Define non-conforming structures and map them to conforming DITA structures as necessary for processing by DITA-aware processors or for interchange as conforming DITA documents.

Specializing from more generic base elements, such as defining a list using specializations of <ph> or <div>, while technically conforming, might still impede interchange of such documents. Generic DITA processors will have no way of knowing that what they see as a sequence of phrases or divisions is really a list and should be rendered in a manner similar to standard DITA lists. However, your documents will be reliably interchangeable with conforming DITA systems.

Defining non-conforming markup structures means that the resulting documents are not conforming DITA documents. They cannot be reliably processed by generic DITA-aware processors or interchanged with other DITA systems. However, as long as the documents can be transformed into conforming DITA documents without undue effort, interchange and interoperation requirements can be satisfied as needed. For example, a content management system could add its own required markup for management metadata, but strip the metadata when delivering content to conforming DITA processors.

Note that even if one uses the DITA-defined types as a starting point, any change to those base types not accomplished through specialization or the constraint feature defines a completely new document type that has no normative relationship to the DITA document types, and cannot be considered in any way to be a conforming DITA application. In particular, the use of DITA specialization from non-DITA base types does not produce DITA-conforming vocabularies.

Specialize from generic elements or attributes

Most DITA element types have relaxed content models that are specifically designed to allow a wide set of options when specializing from them. However, some DITA element types do impose limits that might not be acceptable or appropriate for a specific markup application. In this case, consider specializing from a more generic base element or attribute.

Generic elements are available in DITA at every level of detail, from whole topics down to individual keywords, and the generic @base attribute is available for attribute domain specialization.

For example, if you want to create a new kind of list but cannot usefully do so specializing from <ul>, <ol>, <sl>, or <dl>, you can create a new set of list elements by specializing nested <div> elements. This new list structure will require specialized processing to generate appropriate output styling, because it is not semantically tied to the other lists by ancestry. Nevertheless, it will remain a valid DITA specialization, with the standard support for generalization, content referencing, conditional processing, and more.

The following base elements in <topic> are generic enough to support almost any structurally-valid DITA specialization:

<topic>: Any content unit that has a title and associated content
<section>: Any non-nesting division of content within a topic, titled or not
<p>: Any non-nesting non-titled block of content below the section level
<fig>: Any titled block of content below the section level
<ul>, <ol>, <dl>, <sl>, <simpletable>: Any structured block of content that consists of listed items in one or more columns
<ph>: Any division of content below the paragraph level
<text>: Text within a phrase
<keyword>: Any non-nesting division of content below the paragraph level
<data>: Any content that acts as metadata rather than core topic or map content
<foreign>: Any content that already has a non-DITA markup standard, but still needs to be authored as part of the DITA document. Processors should attempt to render this element, if at all possible.
<unknown>: Any non-standard markup that does not fit the DITA model, but needs to be managed as part of a DITA document. Processors should not attempt to render this element.
<bodydiv>: A generic, untitled, nestable container for content within topic bodies
<sectiondiv>: A generic, untitled, nestable container for content within sections
<div>: A generic, untitled, nestable container for content within topic bodies or sections

The following attributes in topic are suitable for domain specialization to provide new attributes that are required throughout a document type:

@props: Any new conditional processing attribute
@base: Any new attribute that is universally available, has a simple syntax (space-delimited alphanumeric values), and does not already have a semantic equivalent

Whenever possible, specialize from the element or attribute that is the closest semantic match.