[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Mixed content, separation

Subject: Re: [xsl] Mixed content, separation
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 29 Feb 2012 15:09:37 -0500

Hi,

On 2/29/2012 4:58 AM, David Carlisle wrote:

The right thing is to have a list as part of the paragraph if it is part
of the sentence. It is HTML's content model for p that is wrong.

HTML seems to think that a "paragraph" (or a "p", if "p" does not stand for "paragraph") is some chunk of text distinguished by vertical whitespace, as opposed to anything the rhetoricians or composition instructors call a paragraph.

Interestingly, Strunk and White (authorities on composition in American English still widely respected) have, even in their explanation of what a paragraph is and how to use it, list structures and block quotes in the midst of paragraphs.

http://www.bartleby.com/141/strunk5.html#9

All the mainstream documentary XML formats, including TEI, Docbook, NLM/NISO, and DITA, permit paragraphs to contain elements that will format as blocks, including tables, lists, code blocks and so forth. HTML does not.

The problem goes away if you regard HTML "p" as something other than a paragraph (perhaps a "block fragment"). Generalizing, it is apparent that HTML's semantics are presentational, not really descriptive in any reliable way of the source data, even when used well, and hence not really application-independent.

To bring this back on topic, this is a big reason why transforming out of HTML can be such a beast, as opposed to using it as a transformation target. In other words -- yes, it's hard to split Docbook "para" (say) around lists and tables to get valid HTML. But what's really hard is to transform back into Docbook from HTML and expect to get paragraphs back and not just paragraph fragments marked as "para".

Cheers,
Wendell

======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread
Re: [xsl] Mixed content, separation, (continued) Message not available Geert Bormans - 29 Feb 2012 10:51:18 -0000 David Carlisle - 29 Feb 2012 09:58:56 -0000 Andrew Welch - 29 Feb 2012 10:07:17 -0000 Geert Bormans - 29 Feb 2012 10:10:37 -0000 Wendell Piez - 29 Feb 2012 20:09:48 -0000 <=

<- Previous	Index	Next ->
Re: [xsl] Mixed content, separation, Geert Bormans	Thread	[xsl] collection issue, update/rena, davep
Re: [xsl] faster complicated counti, Alex Muir	Date	[xsl] Finding triples with same/dis, Michael Müller-Hille
	Month

Keywords

docbook
tei
xml

Re: [xsl] Mixed content, separation

Products

Features

Shop

Resources

Support

Company