[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] reading a .xsv file in xslt


Subject: Re: [xsl] reading a .xsv file in xslt
From: ac <ac@xxxxxxxxxxxxx>
Date: Wed, 03 Feb 2010 05:27:53 -0500

Hi Andrew,

I am not sure that I understand well your point on QNames and normalize-space, as if the header line is missing from the csv, picking it up to try to derive element or attribute names from it is bound to generate invalid QName errors, for example, with an amount or dollar value. The way I currently understand it, one would have to test and determine if line 1 provides headers or not, figuring options if it is not.

I would, for example, use something like
<xsl:function name="fn:get-headers" as="xs:string+">
<xsl:param name="param"/>
<xsl:param name="line1"/>
<xsl:param name="line2"/>
<xsl:variable name="headerline" select="if (string($param)) then $param else if (every $x in fn:get-tokens($line1) satisfies ($x castable as xs:QName)) then $line1 else ''"/>
<xsl:variable name="headers" select="fn:get-tokens($headerline)"/>
<xsl:for-each select="1 to max((count(fn:get-tokens($line2)), count($headers)))">
<xsl:variable name="pos" select="position()"/>
<xsl:value-of select="if ($headers[$pos] castable as xs:QName) then $headers[$pos] else concat('col', string($pos))"/>
</xsl:for-each>
</xsl:function>
to get the headers, covering most cases of missing, or partly missing, or not missing column headers, with the option of providing a set of column header names as a parameter, assuming an invocation, from your example code, like
<xsl:variable name="names" select="fn:get-headers($headers, $lines[1], $lines[2])" as="xs:string+"/>
where this $headers is a parameter to the main "csv2xml" template, which could be empty or a string similar to the expected csv header line.


The fact that there may not be a header line in the csv file, also implies that the line
<xsl:for-each select="$lines[position() &gt; 1]">
may have to be changed to something like
<xsl:for-each select="$lines[position() &gt; every $x in $line[1] satisfies ($x castable as xs:QName)]">
for example.



'&#xa;' indeed displays as space in html but wouldn't '\r?\n' be more portable?



As a note on extending your example, the name for <root> and <row> could be parametrized and I think that I would move <root> further outside the nested code and allow $csvpath to be a space-delimited name list, for example, to easily support csv file merge into the tree, by simply looping over the tokenized file paths.



Your code offers a good basic design and I especially like your regex token grabber.



Thank you, ac





On 3 February 2010 04:11, ac <ac@xxxxxxxxxxxxx> wrote:
Hi,

Andrew, your code is fine but it seems that, to read lines, the line
<xsl:variable name="lines" select="tokenize($csv, ' ')"
as="xs:string+" />
should be more like
<xsl:variable name="lines" select="tokenize($csv, '\r?\n')"
as="xs:string+" />
as there could be spaces in the cells, and as the end-of-line would not be
recognized anyway.
What do you think?

That's from it being displayed as html (which i should probably fix)... if you use the download link to get the file instead then you can see that it tokenizes on a carriage return:

<xsl:variable name="lines" select="tokenize($csv, '&#xa;')" as="xs:string+"/>



Also, but purely as a matter of taste and case, since all cell values are
strings, I would tend to use attributes, replacing
<elem name="{.}">
<xsl:value-of select="$lineItems[$pos]" />
</elem>
with
<xsl:attribute name="{.}" select="$lineItems[$pos]"/>

See below.....


Finally, one may also have to handle the case of csv files that do not have
an initial header line with all valid QName strings.

It can - that's why the name is stored as a name attribute on a general <elem> element, so that they dont have to be QNames.

I get a lot of csv files (e.g. from the bank) where the first line is a
blank line

Use normalize-space on the entire csv text (the string returned from unparsed-text) then use it again on the column names.


Current Thread
Keywords