Page 1 of 1

Can Oxygen Convert Tab Delimited To XML?

Posted: Sun May 08, 2005 1:26 pm
by Steve Wilkison
I'm relatively new to XML and Oxygen. I've looked through the documentation and can't find anything on this, so I thought I'd ask here. Can Oxygen take a tab delimited file and convert it into basic XML? If not, is there a simple program that can do this (for the Mac)? Thanks for any help, insight or pointers.

Posted: Sun May 08, 2005 9:11 pm
by george
Hi Steve,

Oxygen 5.1 does not have this out of the box but you can get that using the XSLT 2.0 support. For instance the following XSLT 2.0 stylesheet (you need to set Saxon8 as transformer):

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<!-- The tab delimited document, relative to the stylesheet location or an absolute location -->
<xsl:param name="doc" select="'sample.txt'"/>
<!-- The encoding for the tab delimited document -->
<xsl:param name="enc" select="'UTF-8'"/>
<!-- The result XML root element name -->
<xsl:param name="root" select="'file'"/>
<!-- The result XML element name that will mark the values from a line -->
<xsl:param name="line" select="'line'"/>
<!-- The result XML element name that will mark each value from the input document -->
<xsl:param name="entry" select="'entry'"/>

<!--
main template
-->
<xsl:template match="/">
<xsl:element name="{$root}">
<xsl:call-template name="tLines">
<xsl:with-param name="value" select="unparsed-text($doc, $enc)"/>
</xsl:call-template>
</xsl:element>
</xsl:template>
<!--
tokenize lines
-->
<xsl:template name="tLines">
<xsl:param name="value" select="''"/>
<xsl:analyze-string select="$value" regex="\n|\r">
<xsl:matching-substring/>
<xsl:non-matching-substring>
<xsl:element name="{$line}">
<xsl:call-template name="tValues">
<xsl:with-param name="value" select="."/>
</xsl:call-template>
</xsl:element>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<!--
tokenize values
-->
<xsl:template name="tValues">
<xsl:param name="value" select="''"/>
<xsl:analyze-string select="$value" regex="\t">
<xsl:matching-substring/>
<xsl:non-matching-substring>
<xsl:element name="{$entry}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
applied on a dummy input XML and having the tab delimited file in the same folder named sample.txt

Code: Select all


a1	a2	a3	a4
v1 v2 v3 V4
X1 X2 X3 X4
will give as output:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<file>
<line>
<entry>a1</entry>
<entry>a2</entry>
<entry>a3</entry>
<entry>a4</entry>
</line>
<line>
<entry>v1</entry>
<entry>v2</entry>
<entry>v3</entry>
<entry>V4</entry>
</line>
<line>
<entry>X1</entry>
<entry>X2</entry>
<entry>X3</entry>
<entry>X4</entry>
</line>
</file>
The tag names, the name of the input file and its encoding can be specified as parameters. Also if you change the regex in
<xsl:analyze-string select="$value" regex="\t">
then you can handle different delimitators like comma, semicolumn, etc.

Support directly in oXygen for this type of conversions will be available in the next release.

Best Regards,
George

Posted: Wed May 18, 2005 2:47 pm
by stefan
Convert from text files (csv, tab delimited) to XML is now available
http://www.oxygenxml.com/database_import.html

Thank you

Posted: Wed May 18, 2005 3:31 pm
by Steve Wilkison
Thank you George for you initial reply, it was very helpful. Thank you Stefan for the info on the new version.

Heh..almost a year later...

Posted: Wed Mar 29, 2006 10:21 pm
by diblassio4
Hi George,

I was trying to use the nice example you posted, but using <xsl:analyze-string select="$value" regex="\|"> for pipe-delimited instead.
The problem comes when there is an empty field like such:
field|secondfield|||fifthField (fields 3 and 4 are empty)
field||thirdfield||fifthField (fields 2 and 4 are empty)

then the positions get janked out of consistency in the results...

Any ideas?

Thanks in advance!

Posted: Mon Apr 03, 2006 8:49 am
by Radu

Posted: Mon Nov 19, 2007 7:03 pm
by Eiríkr
george wrote:Hi Steve,

Oxygen 5.1 does not have this out of the box but you can get that using the XSLT 2.0 support. For instance the following XSLT 2.0 stylesheet (you need to set Saxon8 as transformer): <snip>


The tag names, the name of the input file and its encoding can be specified as parameters. Also if you change the regex in
<xsl:analyze-string select="$value" regex="\t">
then you can handle different delimitators like comma, semicolumn, etc.

Support directly in oXygen for this type of conversions will be available in the next release.

Best Regards,
George
Hello George --

I've now got oXygen 9, and I'm trying this solution out on a .csv file of my own. It seems to work fine in the debugger when the .csv file is explicitly named in the .xsl, and when the .csv file is not the file chosen as the source. But what if I need a more general solution? No matter if I leave the filename out by omitting the following:

Code: Select all


<xsl:param name="doc" select="'sample.txt'"/>
... so long as I specify the .csv file as the source in the dropdowns at the top of the window, I cannot use the basic transformation capabilities of oXygen's debug view, as I get the following error:

Code: Select all


Content is not allowed in prolog.
Clicking the error takes me to a webpage showing a list of Saxon error codes, suggesting that the error was generated by the Saxon parser. Is there any way I'm missing of telling Saxon not to parse the source file as XML, but rather as straight-up text, but *without* having to specify the source file in a parameter?

Cheers,

Eiríkr

Posted: Tue Nov 20, 2007 4:08 pm
by sorin_ristache
Hello,
Eiríkr wrote:
george wrote:Oxygen 5.1 does not have this out of the box but you can get that using the XSLT 2.0 support.
...
Support directly in oXygen for this type of conversions will be available in the next release.
Hello George --

I've now got oXygen 9, and I'm trying this solution out on a .csv file of my own.
The support for conversion from CSV files to XML that George mentioned is the import feature available from File -> Import -> Text file (the comma delimiter) as you can see above in Stefan's post. You can use this feature for converting your CSV files to XML and if you need other XML format you should apply an XSLT stylesheet to convert the result of the import operation to your XML format.
Eiríkr wrote:Is there any way I'm missing of telling Saxon not to parse the source file as XML, but rather as straight-up text, but *without* having to specify the source file in a parameter?
Any XSLT transformer requires a well-formed XML document and a valid XSLT stylesheet as inputs. You have to set the name of the CSV file with a parameter to the XSLT stylesheet as you already did with the xsl:param element.


Regards,
Sorin

Posted: Tue Nov 20, 2007 6:24 pm
by Eiríkr
sorin wrote: Any XSLT transformer requires a well-formed XML document and a valid XSLT stylesheet as inputs. You have to set the name of the CSV file with a parameter to the XSLT stylesheet as you already did with the xsl:param element.
Thank you, Sorin. I was wondering if there might be some sort of processor instruction to tell the parsing engine to handle the source document differently, but your comment and further thinking it through tells me that the needed processor instruction would likely have to be in the source document itself anyway, which clearly won't work in my case where the source .csv file is not known prior to runtime, and where the .csv file needs to be read-only to boot. So instead, I'll have to come up with some way of programmatically changing the specific xsl:param attribute value just before running the transformation, and simply handle that side of things via some different engine.

Cheers,

Eiríkr

Re: Can Oxygen Convert Tab Delimited To XML?

Posted: Sat Mar 08, 2008 12:38 am
by KermitTensmeyer
The simple solution is a two step process.

1) convert csv to an excel spreadsheet (open empty spreadsheet, import csv file)

2) oxygen, [file]/[import] spread sheet which brings up a nice interface
(it can be most useful if the first row of the spreadsheet contains column labels which then can become element tags into the converted XML file)

change root and row labels as need, and convert!

Re: Can Oxygen Convert Tab Delimited To XML?

Posted: Sat Mar 08, 2008 9:08 am
by george
Eiríkr wrote: [...] So instead, I'll have to come up with some way of programmatically changing the specific xsl:param attribute value just before running the transformation, and simply handle that side of things via some different engine.
The filename is a parameter, that means you can specify its value before transforming. The value specified in the code is a default, that is used when you do not specify the parameter.
In oXygen you can specify the parameter from the transformation scenario dialog, see the Parameters button on the XSLT tab. If you have problems with configuring that let us know.

Best Regards,
George