[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Efficient way to do an identity transform, eliminating duplicate elements, in XSLT 1.0?


Subject: Re: [xsl] Efficient way to do an identity transform, eliminating duplicate elements, in XSLT 1.0?
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxx>
Date: Fri, 13 Dec 2013 09:57:50 -0500

Dear Roger,

I think you need to do some more diagnostics.

Try a stylesheet with a single empty template:

<xsl:template match="/"/>

If you still run out of memory, the problem is not the copying. (It is
probably because you are just putting too much stuff in your blender.
Try switching processors and/or allocating more RAM. :-)

If this runs, you still don't know it's the copying. Try:

<xsl:template match="/">
  <xsl:copy-of select="/"/>
</xsl:template>

If this also runs, then you can look more closely at the logic of your
stylesheet.

Does "efficient" mean "uses less RAM"? If so, consider pipelining. But
if the document doesn't even go through your 1.0 pipe, you may need to
use another technology to split it into chunks first.

I hear there is this new feature coming on line, XSLT streaming.... :-)

Cheers, Wendell
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^


On Fri, Dec 13, 2013 at 9:33 AM, Costello, Roger L. <costello@xxxxxxxxx>
wrote:
> Hi Folks,
>
> I need to do an identity transform on XML files like this:
>
> <Document>
>     <First>
>         <id>A</id>
>         <blah>B</blah>
>         <id>A</id>
>     </First>
>     <Second>
>         <id>C</id>
>         <blah>D</blah>
>         <id>C</id>
>     </Second>
> </Document>
>
> I want the identity transform to remove duplicate elements in <First> and
remove duplicate elements in <Second>. So the output should be:
>
> <Document>
>     <First>
>         <id>A</id>
>         <blah>B</blah>
>     </First>
>     <Second>
>         <id>C</id>
>         <blah>D</blah>
>     </Second>
> </Document>
>
> I need to use XSLT 1.0 to implement this.
>
> I created an implementation, but it uses <copy> statements. The actual XML
document that I am transforming is huge, nearly 1 GB. When I run my XSLT
implementation the processor runs out of memory. I think it's due to the
<copy> statements. I need a very efficient implementation. Any suggestions?
>
> /Roger


Current Thread
Keywords