[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Hi,
On 2/29/2012 4:58 AM, David Carlisle wrote:
HTML seems to think that a "paragraph" (or a "p", if "p" does not stand for "paragraph") is some chunk of text distinguished by vertical whitespace, as opposed to anything the rhetoricians or composition instructors call a paragraph.
Interestingly, Strunk and White (authorities on composition in American English still widely respected) have, even in their explanation of what a paragraph is and how to use it, list structures and block quotes in the midst of paragraphs.
http://www.bartleby.com/141/strunk5.html#9
All the mainstream documentary XML formats, including TEI, Docbook, NLM/NISO, and DITA, permit paragraphs to contain elements that will format as blocks, including tables, lists, code blocks and so forth. HTML does not.
The problem goes away if you regard HTML "p" as something other than a paragraph (perhaps a "block fragment"). Generalizing, it is apparent that HTML's semantics are presentational, not really descriptive in any reliable way of the source data, even when used well, and hence not really application-independent.
To bring this back on topic, this is a big reason why transforming out of HTML can be such a beast, as opposed to using it as a transformation target. In other words -- yes, it's hard to split Docbook "para" (say) around lists and tables to get valid HTML. But what's really hard is to transform back into Docbook from HTML and expect to get paragraphs back and not just paragraph fragments marked as "para".
Re: [xsl] Mixed content, separation
Subject: Re: [xsl] Mixed content, separation From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx> Date: Wed, 29 Feb 2012 15:09:37 -0500 |
Hi,
On 2/29/2012 4:58 AM, David Carlisle wrote:
The right thing is to have a list as part of the paragraph if it is part of the sentence. It is HTML's content model for p that is wrong.
HTML seems to think that a "paragraph" (or a "p", if "p" does not stand for "paragraph") is some chunk of text distinguished by vertical whitespace, as opposed to anything the rhetoricians or composition instructors call a paragraph.
Interestingly, Strunk and White (authorities on composition in American English still widely respected) have, even in their explanation of what a paragraph is and how to use it, list structures and block quotes in the midst of paragraphs.
http://www.bartleby.com/141/strunk5.html#9
All the mainstream documentary XML formats, including TEI, Docbook, NLM/NISO, and DITA, permit paragraphs to contain elements that will format as blocks, including tables, lists, code blocks and so forth. HTML does not.
The problem goes away if you regard HTML "p" as something other than a paragraph (perhaps a "block fragment"). Generalizing, it is apparent that HTML's semantics are presentational, not really descriptive in any reliable way of the source data, even when used well, and hence not really application-independent.
To bring this back on topic, this is a big reason why transforming out of HTML can be such a beast, as opposed to using it as a transformation target. In other words -- yes, it's hard to split Docbook "para" (say) around lists and tables to get valid HTML. But what's really hard is to transform back into Docbook from HTML and expect to get paragraphs back and not just paragraph fragments marked as "para".
Cheers, Wendell
====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Mixed content, separation, Geert Bormans | Thread | [xsl] collection issue, update/rena, davep |
Re: [xsl] faster complicated counti, Alex Muir | Date | [xsl] Finding triples with same/dis, Michael Müller-Hille |
Month |