[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Mixed content, separation

Subject: Re: [xsl] Mixed content, separation
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 29 Feb 2012 15:09:37 -0500


On 2/29/2012 4:58 AM, David Carlisle wrote:
The right thing is to have a list as part of the paragraph if it is part
of the sentence. It is HTML's content model for p that is wrong.

HTML seems to think that a "paragraph" (or a "p", if "p" does not stand for "paragraph") is some chunk of text distinguished by vertical whitespace, as opposed to anything the rhetoricians or composition instructors call a paragraph.

Interestingly, Strunk and White (authorities on composition in American English still widely respected) have, even in their explanation of what a paragraph is and how to use it, list structures and block quotes in the midst of paragraphs.


All the mainstream documentary XML formats, including TEI, Docbook, NLM/NISO, and DITA, permit paragraphs to contain elements that will format as blocks, including tables, lists, code blocks and so forth. HTML does not.

The problem goes away if you regard HTML "p" as something other than a paragraph (perhaps a "block fragment"). Generalizing, it is apparent that HTML's semantics are presentational, not really descriptive in any reliable way of the source data, even when used well, and hence not really application-independent.

To bring this back on topic, this is a big reason why transforming out of HTML can be such a beast, as opposed to using it as a transformation target. In other words -- yes, it's hard to split Docbook "para" (say) around lists and tables to get valid HTML. But what's really hard is to transform back into Docbook from HTML and expect to get paragraphs back and not just paragraph fragments marked as "para".


Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

Current Thread