[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] What is Micro Pipelining: an attempt for a definition (was: Calculating cumulative values - Abel's solution)


Subject: Re: [xsl] What is Micro Pipelining: an attempt for a definition (was: Calculating cumulative values - Abel's solution)
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Sat, 01 Sep 2007 01:42:50 +0200

Hi Wendell,

Very valuable feedback. However, though I think I grasp your points, after reading I still find a lot of "micro-pipelining" to be in a gray area. Let me try to explain myself.

For me, micro-pipelining is defined as the process of pipelining a set of data during a one pass through an XSLT processor. Normal pipelining would be to re-apply the results of one pass to the next pass. I understand that this doesn't tally with your definition below.

Another attempt for a definition (still my own, original, definition) would be the following: in a case where you apply templates (or for-each perhaps) to a set of data that is not part of the input data (input data being any external source, including document() / collection() etc) is a micro pipeline. Shorter: when you have a variable to contain data that is already processed and you re-apply this variable, it is a micro pipeline.

Now I do understand that this is a bit of a too wide definition. But now on to yours. You explain that for anything to be a micro pipeline it must be applied (or expected to be applied) more than once. More appropriately: when you can extract the pipeline variable into a global variable, it is not a micro pipeline anymore (is it then a macro pipeline?).

But things are still not so trivial. It is tempting to say that the following is only executed once, because there's only one root in a document, and as a result, it is easy to extract the variable to the global level:

  <xsl:template match="/">
      <xsl:variable name="micro">
           <xsl:apply-templates select="root/some/data" />
      </xsl:variable>
      <xsl:apply-templates select="$micro/*" />
  </xsl:template>

But often, simple examples, or inquiries on this list, are part of a larger stylesheet or solution. Suppose I alter the above example to be as follows:

<xsl:param name="urls" select=" 'one.xml', 'two.xml', 'three.xml' " />

  <xsl:template name="main">
      <xsl:apply-templates select="document($list-of-documents)" />
  </xsl:template>

  <xsl:template match="/">
      <xsl:variable name="micro">
           <xsl:apply-templates select="root/some/data" />
      </xsl:variable>
      <xsl:apply-templates select="$micro/*" />
  </xsl:template>

is it now still not a micro pipeline? It is applied several times (three times) and it is not possible anymore to make the variable global. Is it really necessary to restrict a micro pipeline to be one only when it is applied to a local level? Though I can follow your point in that it is closer to a "large pipeline" than to a "micro pipeline".

Now let's try the other opposite. To define a term, you need to know where its boundaries are. Suppose we have the following fictional stylesheet, would the application of $micro be a micro pipeline?

 <xsl:variable name="micro">
     this must be tokenized
 <xsl:variable>

 <xsl:template name="main">
    <xsl:apply-templates select="my:tokenize($micro)" />
 </xsl:template>

 <xsl:template match="my:token">
     <xsl:copy-of select="." />
 </xsl:template>

 <xsl:function name="my:tokenize">
    <xsl:param name="tokens">
    <xsl:variable name="preproc-tokens">
        <xsl:for-each select="tokenize($tokens, ' ')">
            <my:token value="{.}" />
        </xsl:for-each>
    </xsl:variable>
    <xsl:apply-templates select="$preproc-tokens/*" mode="my:tokenize" />
 </xsl:function>

 <xsl:template match="my:token" mode="my:tokenize">
    <xsl:copy>
       <xsl:sequence select="replace('text(), [^A-z]', '')" />
    </xsl:copy>
 </xsl:template>


In this tokenize example we see two things. We see a global variable holding data. This is processed using my:tokenize() and then the results are re-applied. Even though we are talking about global data (holding a document node with text), I would consider both phases a micro pipeline: the apply-templates in the my:tokenize function starts a micro pipeline, and the apply-templates in the main entry template starts a micro pipeline. Or not?


I don't know the answers. I have seen the term micro pipeline come up every now and then without a lot of explanation. I did a quick check on the internet a couple of times, but a clear definition seems hard to find. Even the XML Pipeline languages (still in Draft) at W3C do not mention the distinction. Wikipedia has a small but effective line on the subject though: "Some standards also categorize transformation as macro (changes impacting an entire file) or micro (impacting only an element or attribute)" (http://en.wikipedia.org/wiki/XML_pipeline).

But this simple-enough definition does not help XSLT: we have root nodes, document nodes, files, non-xml data, result tree fragments, sequences.... When is it micro and when is it macro? I do think your definition comes quite close, but it has some rough edges. Would you (and others) give it a try? (the definition, I mean)?

Cheers,

-- Abel Braaksma



Wendell Piez wrote:
Abel,

Thanks for the very nicely composed explanation for Simon. One of the best things about this list is how explanations are provided that can be followed by people who don't themselves have any need for a particular solution, but who can still learn valuable lessons and techniques from the sidelines. It's one of the best ways of learning.

I would, however, quarrel with one aspect of your description. You have the code:

<xsl:template match="/">
<!-- mini pipeline: put points into a variable and process them -->
<xsl:variable name="points">
<xsl:apply-templates select="root/set[1]/point[1]" mode="aggregate">
<xsl:with-param name="calc" tunnel="yes">
<for x="1" y2="2" y3="2"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:variable>


        <!-- apply set with pre-processed points -->
        <xsl:apply-templates select="root/set">
            <xsl:with-param name="points" select="$points" />
        </xsl:apply-templates>
    </xsl:template>

... which you refer to as using a "micro-pipeline".

As I explained earlier this week, I don't believe this is a micro-pipeline, since it operates globally. The declaration of $points could appear outside the template and it would perform the same way. A micro-pipeline would be if you bound a variable in a template you expected to fire more than once, and then processed the results of that variable. Admittedly there may be a grey zone between a pipeline operating at the document (global) level, and one that operates at a more local level; but I believe this falls fairly clearly into the first category.

(Then too, as you also explain, you don't really run a pipeline here at all, but use the results of pre-processing as a lookup table.)

The reason I stress this is because I'm afraid that if we start calling anything a micro-pipeline just because it involves some matching and applying of templates whose results won't appear in the output, then we'll have to invent a new word for what we actually invented the term for.

The more general technique, I'd say, is called "pipelining", or -- if the results are themselves not processed directly (this includes generating lookup tables) "pre-processing". Another interesting thing to reflect on is that pipelining and pre-processing can be achieved in XSLT 1.0 by passing the results of one stylesheet into another as its source. This really isn't practical with micro-pipelining, which happens only within the scope of a single branch of the tree at a time.

I apologize if this comes across as rude. Another valuable thing we do on this list is guard one another's terminology. ("Don't call them tags", etc.) This keeps the language strong because we have unambiguous terms to refer to things, methods and techniques, which can then be discussed and learned about.

Cheers,
Wendell


Current Thread
Keywords