[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] unparsed-text and normalize-space when parsing CSV files


Subject: Re: [xsl] unparsed-text and normalize-space when parsing CSV files
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 5 Dec 2014 19:51:39 -0000

What about using:

            tokenize($csv, '\r\n|\r|\n')[not(position()=last() and .='')]


Cheers,
Dimitre

On Fri, Dec 5, 2014 at 11:36 AM, Hank Ratzesberger xml@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> I ran into a strange issue where I was running transforms on a Windows
> platform, but under Cygwin. I was trying to parse a csv file.
>
> The problem was that I was defining a variable for the newline, which
> I expected would match the native system:
>
> <xsl:variable name="nl">
>     <xsl:text>
> </xsl:text>
> </xsl:variable>
>
> and then parse the file like this:
>
> <xsl:variable name="lines" select="tokenize($csv, $nl)" as="xs:string+" />
>
> but it turns out that this does not really solve the issue of
> mixed-source line endings since one or the other could have been
> edited on a different file system. So I think this is a common issue
> of parsing these kinds of files.
>
> I was able to rely on normalize-space() to remove an extra CR, but
> that function could make unwanted changes to other content.
>
> Anyone recommend a safe way for this?
>
> Thank you,
> Hank
>
>
> --
> Hank Ratzesberger
> XMLWerks.com


Current Thread