[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] overlap nomenclature


Subject: [xsl] overlap nomenclature
From: Syd Bauman <Syd_Bauman@xxxxxxxxx>
Date: Tue, 21 Feb 2012 10:53:31 -0500

[XML, but not XSL, related response to "Processing milestoned XML
 leads to many preceding:: calls and horrible performance" thread.]

<soapBox>

The first communication between Charles Goldfarb and Yuri Rubinsky
was the written message "tag != element!" (or perhaps "element !=
tag!" or some such). In that same vein ...

   milestone != empty element!

Mat?j Cepl provides a concise description of the overlap problem in
XML, but the solution he describes is NOT the use of milestone
elements. He describes HORSE (hierarchical overlap representation
using same element, but empty). Both methods make use of empty
elements, but they are different.

The use of milestones is a simple overlap solution that only works
when one of the involved "hierarchies" is already flat, and
preferably tessellates the document (or at least the chapter, or
whatever). The most common example of this sort of situation is the
overlap between the hierarchy of chapters and paragraphs of a book,
and the "hierarchy" (really a flat tessellation) of pages in said
book.

Specifically, milestones are a case of using the single empty
segment-boundary element method for (e.g.) referencing systems. Using
milestones, one only explicitly marks *the change* from one position
in a referencing system to another.

    Milestone elements differ, however, in assuming a simple
    single-level segmentation of the text: the values specified in
    any milestone element apply to all following text until the next
    element of the same type marked as belonging to the same edition.
    Hence no explicit end marker is needed and the ID / IDREF
    mechanism can be dispensed with.
         -- paraphrased from http://www.tei-c.org/Vault/ML/mlw18.txt
            footnote (3)

HORSE markup, on the other hand, is a case of typed segment-boundary
delimiters. In HORSE, an empty element is used to explicitly mark
each end (the beginning and the end) of a content object. Each of the
empty elements indicates the other by co-indexing of special
attributes (sID= and eID=).

<aside>

The XML elements used as empty segment-boundary elements, whether
single, paired, or typed, don't really have to be empty, they just
have to be empty with respect to the content objects that overlap.
E.g., 
      <pb ed="Caxton2" n="7"/>
could just as well be expressed as
      <pb>
        <ed>Caxton2</ed>
	<n>7</n>
      </pb>
so long as "Caxton2" and "7" are guaranteed not to be part of the
overlapping data. At one point I had planned to write up a brief
treatise on this issue and submit it as a squib for _Markup
Technologies: Theory and Practice_. Such is life.

</aside>
</soapBox>

> Maybe I should at least briefly explain it. In many areas
> (especially in documents processing) there is a problem with
> multiple possible hierarchies overlapping each other (e.g., in
> Bibles there are divisions of text which are going across verse and
> chapters boundaries and sometimes terminating in the middle of
> verse, many especially English Bibles marks Jesus' sayings with a
> special element, etc.). One of the ways how to overcome obvious
> problem that XML doesn't allow overlapping elements is to use
> milestones. So that the book of Bible is not divided like
> 
> <book>
>   <chapter>
>     <verse>text</verse>
>     ...
>   </chapter>
>   ...
> </book>
> 
> but just putting milestones in the text, i.e.:
> 
> <book>
>   <chapter n="1" />
>     <verse sID="ID1.1" />
>     text of verse 1.1
>     <verse eID="ID1.1" />
>     ...
> </book>
> 
> Is this clear?


Current Thread
Keywords