[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Using 'collection'

Subject: Re: [xsl] Using 'collection'
From: "Mark Wilson pubs@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 29 Aug 2015 14:24:53 -0000

Hi Elliot,
I have never used XQuery or BaseX and will look into that, but what you have said about the XSLT looks good. I will try to sort this out and see where it goes. Thanks for taking the time.

On 8/29/2015 7:13 AM, Eliot Kimber ekimber@xxxxxxxxxxxx wrote:
This sounds like a job better done using XQuery. A quick solution would be
to install BaseX and use its GUI to load your XML files and then apply the
query you need to the loaded docs. If you have to do complex
transformations on the things you find you can have the XQuery emit an XML
file that you can then apply an XSLT to, rather than trying to implement
the transform entirely in XQuery.

With XSLT and Saxon you could do something like:

<xsl:stylesheet ...>

<xsl:template name="run">
   <xsl:apply-templates select="collection('docs?select=*.xml')"/>

<xsl:template match="/">
   <!-- do stuff to find what you want in each doc -->

Then use the -i flag for Saxon to specify the initial template to run

The size of the documents shouldn't be a big issue, especially if you can
allocate sufficient memory to the processor. You could probably take
advantage of new streaming features in XSLT 3 and implemented in the
latest Saxon versions.

For something like this you might have to see how much virtual memory the
process requires by running it and if it fails with an out-of-memory
error, give it more until it either runs or you've run out of available
real memory.



Eliot Kimber, Owner
Contrext, LLC

On 8/29/15, 8:36 AM, "Mark Wilson pubs@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

I have been asked to isolate two elements each from a set of individual
xml files containing hundreds of elements. I thought collect() would
work, but each individual file is very large (36,000 + lines) and there
are 8000 of them. I have no idea as how to begin. I would include a
sample file, but as I said, they are very large. Where might I look to
get ideas?

Current Thread