[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Re: [xsl] Performance problem in transformation
Subject: Re: [xsl] Performance problem in transformation From: Jeni Tennison <mail@xxxxxxxxxxxxxxxx> Date: Fri, 22 Jun 2001 06:50:26 +0100 |
Hi Shashank, > I am trying to filter out duplicate records from input XML document. > If I have around 80 records in the XML document and out of which 43 > are unique, transformation is taking forever to complete. (the size > of this input XML document is 223K) > > Can you suggest any better ways of removing duplicate records ? >From the bit of XSLT that you posted, it looks as though your source is something like: <event> <sales_orders_sd_doc>...</sales_orders_sd_doc> <sales_orders_base_uom>...</sales_orders_base_uom> <sales_orders_dlv_qty>...</sales_orders_dlv_qty> <sales_orders_exchg_rate_v>...</sales_orders_exchg_rate_v> <sales_orders_sd_doc>...</sales_orders_sd_doc> <sales_orders_base_uom>...</sales_orders_base_uom> <sales_orders_dlv_qty>...</sales_orders_dlv_qty> <sales_orders_exchg_rate_v>...</sales_orders_exchg_rate_v> ... </event> The most efficient method of identifying unique records is to use the Muenchian method. This uses a key to identify all the records with the same identifier. In your case, you want the key to index the sales_orders_sd_doc elements by their value. The key has to match the elements that you're indexing (i.e. the sales_orders_sd_doc elements) and use the identifying value (i.e. the value of that element). You can give it any name that you want: <xsl:key name="sales_orders" match="sales_orders_sd_doc" use="." /> With the key set up (this goes at the top level of your stylesheet), you can then retrieve all the sales_orders_sd_doc elements with a particular value with the key() function. For example, to get all that have the value 'ABC', you can use: key('sales_orders', 'ABC') Now, the first sales_orders_sd_doc element that will be retrieved from the key is the one that appears first in document order. For any particular value, there will only be a single sales_orders_sd_doc element that is the first retrieved by the key for that value. So to get the unique ones, you need to run over all those elements and work out whether they are the same as the first one retrieved from the key. You can compare the two nodes by generating an ID for each and comparing them. This gives you the expression: /event/sales_orders_sd_doc[generate-id() = generate-id(key('sales_orders', .)[1])] So you can set your $unique-list variable to this node set: <xsl:variable name="unique-list" select="/event/sales_orders_sd_doc [generate-id() = generate-id(key('sales_orders', .)[1])]" /> The other source of inefficiency in your design is how you're iterating over the nodes, using an index to access them rather than just applying templates to the nodes. I'm not sure that I can exactly follow why you're doing what you're doing, but I think that all you need to do is apply templates to the nodes in the $unique-list variable: <xsl:template match="event"> <ROOT message="test"> <xsl:apply-templates select="$unique-list" /> </ROOT> </xsl:template> And then have a template that matches them and creates the Row elements that you want. You can get the values for the various fields in the Row by looking at the immediate following siblings for the sales_orders_sd_doc element that you're currently looking at using the following-sibling:: axis (if the related fields actually come *before* the sales_orders_sd_doc element, then use the preceding-sibling:: axis instead): <xsl:template match="sales_orders_sd_doc"> <Row> <AA> <xsl:value-of select="following-sibling::sales_orders_base_uom[1]" /> </AA> <C> <xsl:value-of select="following-sibling::sales_orders_div_qty[1]" /> </C> <BBB> <xsl:value-of select="following-sibling::sales_orders_exchg_rate_v[1]" /> </BBB> <D> <xsl:value-of select="." /> </D> </Row> </xsl:template> I hope that helps, Jeni --- Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Performance problem in transf, Shashank Rajvanshi | Thread | [xsl] Is this an encoding issue?, Jason Macki |
Re: [xsl] recursions?, Jeni Tennison | Date | Re: [xsl] Summing a Calculation, Jeni Tennison |
Month |