[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Cannot write more than one result document to the same URI


Subject: Re: [xsl] Cannot write more than one result document to the same URI
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 04 Apr 2013 22:56:54 -0400

At 2013-04-04 19:37 -0700, Dan Vint wrote:
I can live with the rule, just would like to understand the logic.

Consider the following scenario. An XML document has two elements <b>:


  <a>
    <b id="1">...</b>
    <b id="2">...</b>
  </a>

An XSLT stylesheet uses the built-in template rule for <a> and has a template rule for <b>:

   <xsl:template match="b">
     <xsl:result-document href="output.xml">
       <xsl:copy-of select="."/>
     </xsl:result-document>
   </xsl:template>

If the specification allowed this, then without considering the opportunities of parallelism, one might come to the conclusion that the file "output.xml" would always contain:

<b id="1">...</b><b id="2">...</b>

The problem is that the specification does not require the XSLT processor to complete the processing of the first <b> before starting or even ending the processing of the second <b>. Sure a single-process implementation "X" likely would. But a parallelized (is that a word?) implementation "Y" running on multiple CPUs could very well fully process the second <b> before the first <b> if it chose to do so. Its only obligation is to arrange the resulting tree with the result of processing the first <b> before the result of processing the second <b>. This obligation ensures that the result of processing by "X" is identical to the result of processing by "Y". But there is no obligation on what the processor does to get to that result.

When using <xsl:result-document> the processor is not building the result tree. It is creating a completely separate result. If the instruction required "re-opening" of the file for append, processor "X" likely would produce the expected result, but processor "Y" in the situation above would produce an unexpected result. Two processors would produce two results.

And this is also why one cannot assert that the writing to the file is even finished before the next attempt to write to the file starts. The file handle could very well still be left open by one parallel process when the other is ready to open it for itself. So it can't be used even if the file is opened for write and not for append.

Note that some of my students have come to class thinking that you have to fully complete an <xsl:result-document> before starting another one to another URI, but I tell them that is not the case. You can nest <xsl:result-document> instructions to different URI target locations, and the nested <xsl:result-document> will complete the nested file and resume the "outer" file output when done, without having to close and re-open the outer file. This has been a very handy feature when fragmenting files. And this would be another reason not to allow the same URI to be used.

I know the what now, even though I don't understand why the rule exists ;-)

I hope the explanation above has helped.


. . . . . . . . . Ken

--
Contact us for world-wide XML consulting and instructor-led training |
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm |
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/s/ |
G. Ken Holman                   mailto:gkholman@xxxxxxxxxxxxxxxxxxxx |
Google+ profile: https://plus.google.com/116832879756988317389/about |
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal |


Current Thread
Keywords