[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Aw: Re: [xsl] Using 'collection'


Subject: Aw: Re: [xsl] Using 'collection'
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 29 Aug 2015 20:05:50 -0000

Try
-xsl:read1.xsl -it:runit
--
Diese Nachricht wurde von meinem Android Mobiltelefon mit GMX Mail
gesendet.

"Mark Wilson pubs@xxxxxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>schrieb:

  Not sure what I am doing wrong here.

  Using this batch file:
  set SAXON_HOME=C:\saxon
  set SAXON_JAR=%SAXON_HOME%\saxon9.jar
  java -jar c:\saxon\saxon9.jar read1.xsl -it:runit

  I get this error.
  P:\British Library>set SAXON_HOME=C:\saxon
  P:\British Library>set SAXON_JAR=C:\saxon\saxon9.jar
  P:\British Library>java -jar c:\saxon\saxon9.jar read1.xsl -it:runit
  Stylesheet file -it:runit does not exist

  Using this stylesheet:
  <?xml version="1.0" encoding="UTF-8"?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:saxon="http://saxon.sf.net/"
  xmlns:mets="http://www.loc.gov/METS/"
  xmlns:blprocess="http://bl.uk/namespaces/blprocess"
  exclude-result-prefixes="xs" version="2.0">
  <xsl:output method="xml" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template name="runit">
  <xsl:apply-templates select="collection('docs?select=*.xml')"/>
  <xsl:for-each select="collection('docs?select=*.xml')">
  <xsl:apply-templates select="saxon:discard-document(.)"/>
  </xsl:for-each>
  </xsl:template>

  <xsl:template match="/">
  <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="mets:amdSec">
  <xsl:if test="@ID eq 'amd0002'">
  <xsl:copy-of select="descendant::blprocess:processMetadata"
  copy-namespaces="no"/>
  </xsl:if>
  </xsl:template>
  </xsl:stylesheet>

  On 8/29/2015 8:59 AM, Michael Kay mike@xxxxxxxxxxxx wrote:
  > It&rsquo;s worth putting the data in an XML database such as BaseX
  if you&rsquo;re going to use it often enough to justify the cost of
  database loading. If you just want to use it once, e.g. to extract a
  subset of the data, then collection() should do the job - either in
  XQuery or XSLT.
  >
  > To keep memory usage down, assuming you&rsquo;re implementing with
  Saxon, the simplest way is to ensure that each document is unloaded
  from memory as soon as it has been processed, which you can do with
  saxon:discard-document:
  >
  > <xsl:for-each
  select=&ldquo;collection(&lsquo;docs?select=*.xml&rsquo;)&rdquo;>
  > <xsl:apply-templates
  select=&ldquo;saxon:discard-document(.)&rdquo;>
  > </xsl:for-each>
  >
  > discard-document() is a pseudo-function that returns a document
  unchanged, but with the side effect that it is marked as available
  for garbage collection.
  >
  > Streamed processing is an alternative - but unfortunately in Saxon
  (until the next release) streaming can&rsquo;t be used together with
  collection().
  >
  > Michael Kay
  > Saxonica
  >
  >
  >> On 29 Aug 2015, at 15:25, Mark Wilson pubs@xxxxxxxxxxxx
  <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
  >>
  >> Hi Elliot,
  >> I have never used XQuery or BaseX and will look into that, but
  what you have said about the XSLT looks good. I will try to sort this
  out and see where it goes. Thanks for taking the time.
  >> Regards,
  >> Mark
  >>
  >> On 8/29/2015 7:13 AM, Eliot Kimber ekimber@xxxxxxxxxxxx wrote:
  >>> This sounds like a job better done using XQuery. A quick solution
  would be
  >>> to install BaseX and use its GUI to load your XML files and then
  apply the
  >>> query you need to the loaded docs. If you have to do complex
  >>> transformations on the things you find you can have the XQuery
  emit an XML
  >>> file that you can then apply an XSLT to, rather than trying to
  implement
  >>> the transform entirely in XQuery.
  >>>
  >>> With XSLT and Saxon you could do something like:
  >>>
  >>> <xsl:stylesheet ...>
  >>>
  >>> <xsl:template name="run">
  >>> <xsl:apply-templates select="collection('docs?select=*.xml')"/>
  >>> </xsl:template>
  >>>
  >>> <xsl:template match="/">
  >>> <!-- do stuff to find what you want in each doc -->
  >>> </xsl:template>
  >>> </xsl:stylesheet>
  >>>
  >>> Then use the -i flag for Saxon to specify the initial template to
  run
  >>> ("run").
  >>>
  >>> The size of the documents shouldn't be a big issue, especially if
  you can
  >>> allocate sufficient memory to the processor. You could probably
  take
  >>> advantage of new streaming features in XSLT 3 and implemented in
  the
  >>> latest Saxon versions.
  >>>
  >>> For something like this you might have to see how much virtual
  memory the
  >>> process requires by running it and if it fails with an
  out-of-memory
  >>> error, give it more until it either runs or you've run out of
  available
  >>> real memory.
  >>>
  >>> Cheers,
  >>>
  >>> Eliot
  >>>
  >>> ----
  >>> Eliot Kimber, Owner
  >>> Contrext, LLC
  >>> http://contrext.com
  >>>
  >>>
  >>>
  >>>
  >>> On 8/29/15, 8:36 AM, "Mark Wilson pubs@xxxxxxxxxxxx"
  >>> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
  >>>
  >>>> I have been asked to isolate two elements each from a set of
  individual
  >>>> xml files containing hundreds of elements. I thought collect()
  would
  >>>> work, but each individual file is very large (36,000 + lines)
  and there
  >>>> are 8000 of them. I have no idea as how to begin. I would
  include a
  >>>> sample file, but as I said, they are very large. Where might I
  look to
  >>>> get ideas?
  >>>> Thanks,
  >>>> Mark
  >>>>
  >>>>
  >
  >

XSL-List info and archiveEasyUnsubscribe (by email)


Current Thread
Keywords