[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] how to efficiently extract unique list of URI's for writing files


Subject: [xsl] how to efficiently extract unique list of URI's for writing files
From: Robby Pelssers <Robby.Pelssers@xxxxxxx>
Date: Mon, 16 Jul 2012 12:55:37 +0200

Hi all,

Again the data is completely made up but is easy to explain what I'm trying to
accomplish.  In below input I have a sequence of objects. Each needs to be
written to some URI.

e.g.
/objects/crontask/SyncDB.xml
/objects/user/12345.xml

The problem is that the input may contain duplicate entries (not deep-equal
though) so for user the <id> would be the unique identifier.  If I try to use
simple pattern matching I will run into an error as I can't write twice to the
same URI.

I came across this post explaining how to solve this issue
http://www.stylusstudio.com/xsllist/200705/post10050.html  but it is still not
clear to me what would be the best way to approach this.

Should I write e.g. functions for all possible types to extract the distinct
URIs and do a second iteration that drops the URI from the sequence and if
it's not present anymore skip processing the object twice?

<xsl:function name="pelssers:getURI">
  <xsl:param name="crontask" as="element(crontask)"/>
  ...TODO...
</xsl:function>

<xsl:function name="pelssers:getURI">
  <xsl:param name="user" as="element(user)"/>
  ...TODO...
</xsl:function>



"Here's a solution that I normally use. Take all URIs that you want to write
to, pack them in a sequence and deduplicate them (use the function
distinct-values or similar) and go from there (if possible) or, if you can't,
you can use a micro-pipeline. I.e., the first transforms and changes the input
and adds _1, _2 etc to the names, to ensure uniqueness, the second is the
transformation where you create the actual result document."

<?xml version="1.0" encoding="UTF-8"?>
<objects>
  <crontask>
    <name>SyncDB</name> <!-- name is identifier -->
    <definition>Syncs filesystem with database</definition>
  </crontask>
  <user>
    <id>12345</id>  <!-- id is identifier -->
    <name>Robby Pelssers</name>
  </user>
  <!--   duplicate entry for same user needs to be skipped although name is
different  -->
  <user>
    <id>12345</id>
    <name>Robby PelssersXX</name>
  </user>
</objects>


Current Thread