[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Using 'collection'


Subject: Re: [xsl] Using 'collection'
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 29 Aug 2015 15:58:48 -0000

Its worth putting the data in an XML database such as BaseX if youre going
to use it often enough to justify the cost of database loading. If you just
want to use it once, e.g. to extract a subset of the data, then collection()
should do the job - either in XQuery or XSLT.

To keep memory usage down, assuming youre implementing with Saxon, the
simplest way is to ensure that each document is unloaded from memory as soon
as it has been processed, which you can do with saxon:discard-document:

<xsl:for-each select=collection(docs?select=*.xml)>
  <xsl:apply-templates select=saxon:discard-document(.)>
</xsl:for-each>

discard-document() is a pseudo-function that returns a document unchanged, but
with the side effect that it is marked as available for garbage collection.

Streamed processing is an alternative - but unfortunately in Saxon (until the
next release) streaming cant be used together with collection().

Michael Kay
Saxonica


> On 29 Aug 2015, at 15:25, Mark Wilson pubs@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Elliot,
> I have never used XQuery or BaseX and will look into that, but what you have
said about the XSLT looks good. I will try to sort this out and see where it
goes. Thanks for taking the time.
> Regards,
> Mark
>
> On 8/29/2015 7:13 AM, Eliot Kimber ekimber@xxxxxxxxxxxx wrote:
>> This sounds like a job better done using XQuery. A quick solution would be
>> to install BaseX and use its GUI to load your XML files and then apply the
>> query you need to the loaded docs. If you have to do complex
>> transformations on the things you find you can have the XQuery emit an XML
>> file that you can then apply an XSLT to, rather than trying to implement
>> the transform entirely in XQuery.
>>
>> With XSLT and Saxon you could do something like:
>>
>> <xsl:stylesheet ...>
>>
>> <xsl:template name="run">
>>   <xsl:apply-templates select="collection('docs?select=*.xml')"/>
>> </xsl:template>
>>
>> <xsl:template match="/">
>>   <!-- do stuff to find what you want in each doc -->
>> </xsl:template>
>> </xsl:stylesheet>
>>
>> Then use the -i flag for Saxon to specify the initial template to run
>> ("run").
>>
>> The size of the documents shouldn't be a big issue, especially if you can
>> allocate sufficient memory to the processor. You could probably take
>> advantage of new streaming features in XSLT 3 and implemented in the
>> latest Saxon versions.
>>
>> For something like this you might have to see how much virtual memory the
>> process requires by running it and if it fails with an out-of-memory
>> error, give it more until it either runs or you've run out of available
>> real memory.
>>
>> Cheers,
>>
>> Eliot
>>
>> ----
>> Eliot Kimber, Owner
>> Contrext, LLC
>> http://contrext.com
>>
>>
>>
>>
>> On 8/29/15, 8:36 AM, "Mark Wilson pubs@xxxxxxxxxxxx"
>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>> I have been asked to isolate two elements each from a set of individual
>>> xml files containing hundreds of elements. I thought collect() would
>>> work, but each individual file is very large (36,000 + lines) and there
>>> are 8000 of them. I have no idea as how to begin. I would include a
>>> sample file, but as I said, they are very large. Where might I look to
>>> get ideas?
>>> Thanks,
>>> Mark


Current Thread
Keywords