[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Expensive XSLT2 - suggestions for improving?


Subject: Re: [xsl] Expensive XSLT2 - suggestions for improving?
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Thu, 16 Oct 2008 12:34:38 -0400

Michael,

This is an interesting problem, and you may want to try a few things.

Part of what makes it interesting is the question of how widely you wish to scope your examination for similar values. In XSLT 2, a third argument can be used to define the scope within which the key works.

You could try something like this:

<xsl:key name="oid-by-value" match="@oid" use="string(..)"/>
<!-- retrieves an @oid attribute using the string value of its parent element -->


and then

<xsl:template match="value">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:for-each select="key('oid-by-value',.)[1] except .">
      <!-- traverse to the @oid of the first element with the
           same value, unless this is it -->
      <xsl:attribute name="refoid" select="string()"/>
    </xsl:for-each>
    <!-- skip content -->
  </xsl:copy>
</xsl:template>

if you wanted to scope only within the parent element, you could use key('oid-by-value',.,..)[1] -- the '..' as the third argument restricts the scope of retrieval.

Note: untested. (But if it won't work, surely some sharp-eyed XSLTer will notice.)

Cheers,
Wendell


At 11:54 AM 10/16/2008, you wrote:
Hello experts,

The task is to remove duplicate text content before moving an XML file
into translation. After the translation, the former duplicate content
should be recreated.

Assume this input XML (I dropped a lot of attributes):

<Doc>
<value oid="40068">Lasttrennschalter</value>
<value oid="40069">Umbau von N12 auf N4</value>
<value oid="4006a">Lasttrennschalter</value>
</Doc>

The third <value> should be empty because its content is identical to
the first, but we need a pointer to that first element to be able to
recreate the content after translation. Also, all original attributes
must stay unchanged. Therefore in each duplicate I insert an extra
attribute @refoid with the @oid of the source element. So I get this:

<Doc>
<value oid="40068">Lasttrennschalter</value>
<value oid="40069">Umbau von N12 auf N4</value>
<value oid="4006a" refoid="40068"/>
</Doc>

My XSL is very simple and works as intended, but it does not scale
very good, I guess because I look at preceding::value so many times:

<!-- Condenser: modify all duplicates -->
<xsl:template match="value[.=preceding::value]">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:attribute name="refoid"
      select="preceding::value[.=current()][last()]/@oid"/>
    <!-- skip content -->
  </xsl:copy>
</xsl:template>

<!-- pass-through all nodes and attributes -->
<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

I guess a clever constructed key could help a lot... any pointers are
very welcome!


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


Current Thread
Keywords