[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
[xsl] Tokenizing and transforming a CSV file
Subject: [xsl] Tokenizing and transforming a CSV file From: Mukul Gandhi <gandhi.mukul@xxxxxxxxx> Date: Wed, 25 Feb 2009 22:14:26 +0530 |
Hi all, I have a CSV file (named, test.csv) as following (as an example, two lines/records are shown below): hi,"this is a long string, please tokenize me",hello,world hello,please tokenize me,hi there I want this to be transformed to following XML: <result> <record> <field>hi</field> <field>this is a long string, please tokenize me</field> <field>hello</field> <field>world</field> </record> <record> <field>hello</field> <field>please tokenize me</field> <field>hi there</field> </record> </result> i.e, each line/record should be tokenized by a comma, with a restriction that a comma inside a double quoted string should not be considered as a delimiter: Below is my attempt upto now. <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="xml" indent="yes" /> <xsl:variable name="filedata" select="unparsed-text('test.csv')" /> <xsl:template match="/"> <result> <xsl:for-each select="tokenize($filedata, '\r?\n')"> <record> <xsl:for-each select="tokenize(., ',')"> <field> <xsl:value-of select="." /> </field> </xsl:for-each> </record> </xsl:for-each> </result> </xsl:template> </xsl:stylesheet> The above stylesheet produces following output: <result> <record> <field>hi</field> <field>"this is a long string</field> <field> please tokenize me"</field> <field>hello</field> <field>world</field> </record> <record> <field>hello</field> <field>please tokenize me</field> <field>hi there</field> </record> </result> As per my requirement, following output fragment <field>"this is a long string</field> <field> please tokenize me"</field> is wrong. This should actually appear as: <field>this is a long string, please tokenize me</field> I would appreciate any help regarding this problem. I am using XSLT 2.0 with Saxon 9.x. -- Regards, Mukul Gandhi
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] [ann] oXygen XML Editor 10.1 , George Cristian Bina | Thread | RE: [xsl] Tokenizing and transformi, Michael Kay |
Re: [xsl] SGML to XML, Christopher R. Maden | Date | RE: [xsl] Tokenizing and transformi, Michael Kay |
Month |
Keywords