[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
[xsl] Extraction of data using key() and matches()
Subject: [xsl] Extraction of data using key() and matches() From: Jakob Fix <jakob.fix@xxxxxxxxx> Date: Sat, 5 Jun 2010 21:02:20 +0200 |
Hello, I have a large number of XML data files which contain a table with rows and data cells each (previously Excel files). I'm interested in finding out whether in the table's data cells there is or is not a given country name. If so I want to record in another file all country names that appear in the data file. The country name may be the only content of the data cell (<col>United Kingdom</col>), or it may be surrounded by other text (<col>Data has been provided for United Kingdom only.</col>). It can also be that more than one country name appears in a table cell. There won't be other elements in the cell, just character data. My current approach is to have an exhaustive lookup files with *all* country names that are potentially used. For each XML data file, I loop over all country names and query the contents of each data file whether it matches the current country name. The following works but is rather slow: countries.xml <countries> <country code="ABW"> <fr>Aruba</fr> <en>Aruba</en> </country> <country code="AFG"> <fr>Afghanistan</fr> <en>Afghanistan</en> </country> ... </countries> data.xml <workbook> <sheet> <name><![CDATA[Figure 1.1 (I)]]></name> <row number="0"> <col number="0"><![CDATA[United Kingdom]]></col> </row> <row number="1"> <col number="0"><![CDATA[Part I. ]]></col> <col number="1"><![CDATA[These data apply to France, Germany and a couple of other countries.]]></col> ... </row> ... </sheet> </workbook> extract.xsl <xsl:for-each select="document($country-file)/countries/country/en"> <xsl:variable name="current-node" select="."/> <xsl:if test="$data-doc//col[matches(., $current-node/text())]"> <country><xsl:value-of select="$current-node/../@code"/></country> </xsl:if> </xsl:for-each> In order to speed up the process I was thinking about indexing all data cells using xsl:key. However, I cannot see how the key() and the matches() function can be combined to use the former's speed with the latter's regex power. I was hoping of doing something along these lines, but would need some help as this doesn't currently work: <xsl:key name="cell" match="col" use="text()"/><!-- create an index of the cells' contents --> <xsl:for-each select="document($country-file)/countries/country/en"> <xsl:variable name="current-node" select="."/><!-- don't lose the current node --> <xsl:for-each select="document($data-file)"><!-- change context to data document --> <!-- key returns a nodeset, so count the number of nodes in the nodeset. this doesn't work if the country name is not the only content --> <xsl:if test="count(key("cell", $current-node)) > 0"> <country><xsl:value-of select="$current-node/../@code"/></country> </xsl:if> </xsl:for-each> </xsl:for-each> Maybe there's another solution that is more elegant and more efficient than what I've shown above. I'd love to know about it. Thank you in advance for your help. Jakob.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] JAXP reference implementa, Mukul Gandhi | Thread | Re: [xsl] Extraction of data using , Michael Kay |
Re: [xsl] display & as text, Wolfgang Laun | Date | Re: [xsl] Extraction of data using , Michael Kay |
Month |
Keywords