Formatting and Indenting XML Documents

Oxygen XML Developer plugin creates XML documents using several edit modes. In Text mode, you as the author decide how the XML file is formatted and indented. In the other modes, and when you switch between modes, Oxygen XML Developer plugin must decide how to format and indent the XML. Oxygen XML Developer plugin will also format and indent your XML for you in Text mode if you use one of the Format and Indent options:

  • Document > Source > Format and Indent - Formats and indents the whole document.

  • Document > Source > Indent Selection - Indents the current selection (but does not add line breaks). This action is also available in the Source submenu of the contextual menu.

  • Document > Source > Format and Indent Element - Formats and indents the current element (the inmost nested element that currently contains the cursor) and its child-elements. This action is also available in the Source submenu of the contextual menu.

A number of settings affect how Oxygen XML Developer plugin formats and indents XML. Many of these settings have to do with how whitespace is handled.

Significant and insignificant whitespace in XML

XML documents are text files that describe complex documents. Some of the white space (spaces, tabs, line feeds, etc.) in the XML document belongs to the document it describes (such as the space between words in a paragraph) and some of it belongs to the XML document (such as a line break between two XML elements). Whitespace belonging to the XML file is called insignificant whitespace. The meaning of the XML would be the same if the insignificant whitespace were removed. Whitespace belonging to the document being described is called significant whitespace.

Knowing when whitespace is significant or insignificant is not always easy. For instance, a paragraph in an XML document might be laid out like this:

<p>NO Free man shall be taken or imprisoned, or be stripped of his Freedom,
or Liberties, or free Customs, or be outlawed, or exiled, or any otherwise
destroyed; nor will we not pass upon him, nor condemn him, but by lawful 
judgment of his Peers, or by the <xref 
href="http://en.wikipedia.org/wiki/Law_of_the_land" format="html"
scope="external">Law of the land</xref>. 
We will sell to no man, we will not deny to any man either Justice or Right.</p>

By default, XML considers a single whitespace between words to be significant, and all other whitespace to be insignificant. The paragraph above could have been written on one line because the XML parser would see it as exactly the same paragraph since all multiple consecutive whitespaces will be replaced with a single whitespace. Removing the insignificant space in markup like this is called normalizing space.

In some cases, all the spaces inside an element should be treated as significant. For example, in a code sample:

<codeblock>
class HelloWorld
{
   public static void main(String args[])
   {
      System.out.println("Hello World");
   }
}
</codeblock>

Here every whitespace character between the codeblock tags should be treated as significant.

How Oxygen XML Developer plugin determines when whitespace is significant

When Oxygen XML Developer plugin formats and indents an XML document, it introduces or removes insignificant whitespace to produce a layout with reasonable line lengths and elements indented to show their place in the hierarchy of the document. To correctly format and indent the XML source, Oxygen XML Developer plugin needs to know when to treat whitespace as significant and when to treat it as insignificant. However it is not always possible to tell this from the XML source file alone. To determine what whitespace is significant, Oxygen XML Developer plugin assigns each element in the document to one of four categories:

Ignore space

In the ignore space category, all whitespace is considered insignificant. This generally applies to content that consists only of elements nested inside other elements, with no text content.

Normalize space

In the normalize space category, a single whitespace character between character strings is considered significant and all other spaces are considered insignificant. Therefore, all consecutive whitespaces will be replaced with a single space. This generally applies to elements that contain text content only.

Mixed content

In the mixed content category, a single whitespace between text characters is considered significant and all other spaces are considered insignificant. However,

  • Whitespace between two child elements embedded in the text is normalized to a single space (rather than to zero spaces as would normally be the case for a text node with only whitespace characters, or the space between elements generally).

  • The lack of whitespace between a child element embedded in the text and either adjacent text or another child element is considered significant. That is, no whitespace can be introduced here when formatting and indenting the file.

For example:

<p>The file is located in <i>HOME</i>/<i>USER</i>/hello. 
     This is a <strong>big</strong> 

<emphasis>deal</emphasis>.
</p>

In this example, whitespace should not be introduced around the i tags as it would introduce extra significant whitespace into the document. The space between the end </strong> tag and the beginning <emphasis> tag should be normalized to a single space, not zero spaces.

Preserve space

In the preserve space category, all whitespace in the element is regarded as significant. No changes are made to the spaces in elements in this category. However, child elements may be in another category, and may be treated differently.

Attribute values are always in the preserve space category. The spaces between attributes in an element tag are always in the default space category.

Oxygen XML Developer plugin consults several pieces of information to assign an element to one of these categories. An element is always assigned to the most restrictive category (from Ignore to Preserve) that it is assigned to by any of the sources Oxygen XML Developer plugin consults. For instance, if the element is named on the Default elements list (as described below) but it has an xml:space="preserve" attribute in the source file, it will be assigned to the preserve space category. If an element has the xml:space="default" attribute in the source, but is listed on the Mixed content elements list, it will be assigned to the mixed content category.

To assign elements to these categories, Oxygen XML Developer plugin consults information from the following sources:

xml:space
If the XML element contains the xml:space attribute, the element is promoted to the appropriate category based on the value of the attribute.
Schema aware formatting

If a schema is available for the XML document, Oxygen XML Developer plugin can use information from the schema to promote the element to the appropriate category. For example:

  • If the schema declares an element to be of type xs:string, the element will be promoted to the preserve space category because the string built-in type has the whitespace facet with the value preserve.

  • If the schema declares an element to be mixed content, it will be promoted to the mixed content category.

Schema aware formatting can be turned on and off.

Preserve space elements list

If an element is listed in the Preserve space tab of the Element Spacing list in the XML formatting preferences, it is promoted to the preserve space category.

Default space elements list

If an element is listed in the Default space tab of the Element Spacing list in the XML formatting preferences, it is promoted to the default space category

Mixed content elements list

If an element is listed in the Mixed content tab of the Element Spacing list in the XML formatting preferences, it is promoted to the mixed content category.

Element content

If an element contains mixed content, that is, a mix of text and other elements, it is promoted to the mixed content category. (Note that, in accordance with these rules, this happens even if the schema declares the element to have element only content.)

If an element contains text content, it is promoted to the default space category.

Text node content
If a text node contains any non-whitespace characters then the text node is promoted to the normalize space category.

How Oxygen XML Developer plugin formats and indents XML

You can control how Oxygen XML Developer plugin formats and indents XML documents. This can be particularly important if you store your XML document in a version control system, as it allows you to limit the number of trivial changes in spacing between versions of an XML document. The following preference pages include options that control how XML documents are formatted:

When Oxygen XML Developer plugin formats and indents XML

Oxygen XML Developer plugin formats and indents a document, or part of it, on the following occasions:

  • In Text mode when you select one of the format and indent actions (Document > Source > Format and Indent, Document > Source > Indent Selection, or Document > Source > Format and Indent Element).
  • When saving documents in Design mode.
  • When switching from Design mode to another mode.
  • When saving or switching to Text mode from Grid mode, if the Format and indent when passing from grid to text or on save option is selected in the Grid preferences page.