[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] [XSL] extracting a verse


Subject: RE: [xsl] [XSL] extracting a verse
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 18 Dec 2002 12:06:23 -0500

Jim,

At 10:36 AM 12/18/2002, you wrote:
Yes the example I gave was simplified. There can be other elements present
that would exclude a simple text() answer.

But it got me thinking. Is there a way of saying

the intersection of
all nodes following verse    with id="BCV-GEN-1.1"
AND
all nodes preceding verseEnd with id="BCV-GEN-1.1-END"

If so that would give me all the nodes contained within verse.

This can be done, although it's painful in XPath 1.0.


The problem is that you need to know not only which nodes follow the start verse and precede the end verse; you also have to account for splitting at arbitrary places....

Here, for example, the first <verse/> is followed by an <s> element which contains <seg> elements; the corresponding <endVerse/> is inside one of those segs (i.e. two levels inside one of its <verse/> element's siblings).

<quote>
<verse/><s><seg>Of Man's first disobedience,</seg> <seg>and the fruit<endVerse/>
<verse/>Of that forbidden tree whose mortal taste <endVerse/>
<verse/>Brought death into the World,</seg> <seg>and all our woe,</seg><endVerse/>
<verse/><seg>With loss of Eden,</seg> <seg>till one greater Man <endVerse/>
<verse/>Restore us,</seg> <seg>and regain the blissful seat,</seg><endVerse/>
<verse/><seg>Sing,</seg> <seg>Heavenly Muse,</seg> <seg>that,</seg> <seg>on the secret top <endVerse/>
<verse/>Of Oreb,</seg> <seg>or of Sinai,</seg> <seg>didst inspire <endVerse/>
<verse/>That Shepherd who first taught the chosen seed <endVerse/>
<verse/>In the beginning how the heavens and earth <endVerse/>
<verse/>Rose out of Chaos:</seg> <seg>or,</seg> <seg>if Sion hill <endVerse/>
<verse/>Delight thee more,</seg> <seg>and Siloa's brook that flowed <endVerse/>
<verse/>Fast by the oracle of God,</seg> <seg>I thence <endVerse/>
<verse/>Invoke thy aid to my adventurous song,</seg><seg><endVerse/>
<verse/>That with no middle flight intends to soar <endVerse/>
<verse/>Above th' Aonian mount,</seg> <seg>while it pursues <endVerse/>
<verse/>Things unattempted yet in prose or rhyme.</seg></s><endVerse/>
<verse/><s><seg>And chiefly thou,</seg> <seg>O Spirit,</seg> <seg>that dost prefer <endVerse/>
<verse/>Before all temples th' upright heart and pure,</seg><seg><endVerse/>
<verse/>Instruct me,</seg> <seg>for Thou know'st;</seg> <seg>Thou from the first <endVerse/>
<verse/>Wast present,</seg> <seg>and,</seg> <seg>with mighty wings outspread, </seg><seg><endVerse/>
<verse/>Dove-like sat'st brooding on the vast Abyss, </seg><seg><endVerse/>
<verse/>And mad'st it pregnant:</seg> <seg>what in me is dark <endVerse/>
<verse/>Illumine,</seg> <seg>what is low raise and support;</seg><seg><endVerse/>
<verse/>That,</seg> <seg>to the height of this great argument,</seg><seg><endVerse/>
<verse/>I may assert Eternal Providence,</seg><seg><endVerse/>
<verse/>And justify the ways of God to men.</seg></s><endVerse/>
</quote>


Etc.

One approach is to go "bottom up". This is what Patrick Durusau and Matthew O'Donnell, who (AFAIK) have done the most work in public with this problem, call a "bottom-up virtual hierarchy" (BUVH). One pass flattens *everything* into milestones; the second interpolates the hierarchy you want. (Actually this is a simplification of what they did, though I don't see why it wouldn't work.) This is doable, but quite hairy if you want to preserve any of the original hierarchy, and so processor intensive that you don't want to be trying it on large texts. More lately, their efforts have shifted to an approach they call JITTs ("Just-in-Time Trees"), in which the verse starts and ends are promoted from atomic milestones into real element starts and ends. (A pre-XML-parse process then extracts the hierarchy you want.) While charmingly enunciated, and (I believe) ultimately on the right track, this approach suffers (IMHO) because it tries to repeal the First Law of XML Markup: "Thou Shalt be Cleanly Nested", thereby risking unnecessary Uncertainty and Doubt, if not actually Fear.

(I say ultimately on the right track partly because once it is re-expressed in a different syntax clearly distinguished from XML, this approach is almost to LMNL, which is being designed for this sort of thing ... where XML/XSLT were not.)

For references to Patrick and Matthew's work, type "JITTs" into your favorite web search engine. It's ongoing.

Cheers,
Wendell


====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================


XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list




Current Thread
Keywords