Can Oxygen Convert Tab Delimited To XML?

Questions about XML that are not covered by the other forums should go here.
Steve Wilkison
Posts: 6
Joined: Wed May 19, 2004 6:29 pm

Can Oxygen Convert Tab Delimited To XML?

Post by Steve Wilkison »

I'm relatively new to XML and Oxygen. I've looked through the documentation and can't find anything on this, so I thought I'd ask here. Can Oxygen take a tab delimited file and convert it into basic XML? If not, is there a simple program that can do this (for the Mac)? Thanks for any help, insight or pointers.
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Post by george »

Hi Steve,

Oxygen 5.1 does not have this out of the box but you can get that using the XSLT 2.0 support. For instance the following XSLT 2.0 stylesheet (you need to set Saxon8 as transformer):

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<!-- The tab delimited document, relative to the stylesheet location or an absolute location -->
<xsl:param name="doc" select="'sample.txt'"/>
<!-- The encoding for the tab delimited document -->
<xsl:param name="enc" select="'UTF-8'"/>
<!-- The result XML root element name -->
<xsl:param name="root" select="'file'"/>
<!-- The result XML element name that will mark the values from a line -->
<xsl:param name="line" select="'line'"/>
<!-- The result XML element name that will mark each value from the input document -->
<xsl:param name="entry" select="'entry'"/>

<!--
main template
-->
<xsl:template match="/">
<xsl:element name="{$root}">
<xsl:call-template name="tLines">
<xsl:with-param name="value" select="unparsed-text($doc, $enc)"/>
</xsl:call-template>
</xsl:element>
</xsl:template>
<!--
tokenize lines
-->
<xsl:template name="tLines">
<xsl:param name="value" select="''"/>
<xsl:analyze-string select="$value" regex="\n|\r">
<xsl:matching-substring/>
<xsl:non-matching-substring>
<xsl:element name="{$line}">
<xsl:call-template name="tValues">
<xsl:with-param name="value" select="."/>
</xsl:call-template>
</xsl:element>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<!--
tokenize values
-->
<xsl:template name="tValues">
<xsl:param name="value" select="''"/>
<xsl:analyze-string select="$value" regex="\t">
<xsl:matching-substring/>
<xsl:non-matching-substring>
<xsl:element name="{$entry}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
applied on a dummy input XML and having the tab delimited file in the same folder named sample.txt

Code: Select all


a1	a2	a3	a4
v1 v2 v3 V4
X1 X2 X3 X4
will give as output:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<file>
<line>
<entry>a1</entry>
<entry>a2</entry>
<entry>a3</entry>
<entry>a4</entry>
</line>
<line>
<entry>v1</entry>
<entry>v2</entry>
<entry>v3</entry>
<entry>V4</entry>
</line>
<line>
<entry>X1</entry>
<entry>X2</entry>
<entry>X3</entry>
<entry>X4</entry>
</line>
</file>
The tag names, the name of the input file and its encoding can be specified as parameters. Also if you change the regex in
<xsl:analyze-string select="$value" regex="\t">
then you can handle different delimitators like comma, semicolumn, etc.

Support directly in oXygen for this type of conversions will be available in the next release.

Best Regards,
George
stefan

Post by stefan »

Convert from text files (csv, tab delimited) to XML is now available
http://www.oxygenxml.com/database_import.html
Steve Wilkison
Posts: 6
Joined: Wed May 19, 2004 6:29 pm

Thank you

Post by Steve Wilkison »

Thank you George for you initial reply, it was very helpful. Thank you Stefan for the info on the new version.
diblassio4
Posts: 2
Joined: Wed Mar 29, 2006 10:11 pm

Heh..almost a year later...

Post by diblassio4 »

Hi George,

I was trying to use the nice example you posted, but using <xsl:analyze-string select="$value" regex="\|"> for pipe-delimited instead.
The problem comes when there is an empty field like such:
field|secondfield|||fifthField (fields 3 and 4 are empty)
field||thirdfield||fifthField (fields 2 and 4 are empty)

then the positions get janked out of consistency in the results...

Any ideas?

Thanks in advance!
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Post by Radu »

Eiríkr
Posts: 7
Joined: Mon Nov 19, 2007 6:53 pm
Location: Puget Sound

Post by Eiríkr »

george wrote:Hi Steve,

Oxygen 5.1 does not have this out of the box but you can get that using the XSLT 2.0 support. For instance the following XSLT 2.0 stylesheet (you need to set Saxon8 as transformer): <snip>


The tag names, the name of the input file and its encoding can be specified as parameters. Also if you change the regex in
<xsl:analyze-string select="$value" regex="\t">
then you can handle different delimitators like comma, semicolumn, etc.

Support directly in oXygen for this type of conversions will be available in the next release.

Best Regards,
George
Hello George --

I've now got oXygen 9, and I'm trying this solution out on a .csv file of my own. It seems to work fine in the debugger when the .csv file is explicitly named in the .xsl, and when the .csv file is not the file chosen as the source. But what if I need a more general solution? No matter if I leave the filename out by omitting the following:

Code: Select all


<xsl:param name="doc" select="'sample.txt'"/>
... so long as I specify the .csv file as the source in the dropdowns at the top of the window, I cannot use the basic transformation capabilities of oXygen's debug view, as I get the following error:

Code: Select all


Content is not allowed in prolog.
Clicking the error takes me to a webpage showing a list of Saxon error codes, suggesting that the error was generated by the Saxon parser. Is there any way I'm missing of telling Saxon not to parse the source file as XML, but rather as straight-up text, but *without* having to specify the source file in a parameter?

Cheers,

Eiríkr
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Post by sorin_ristache »

Hello,
Eiríkr wrote:
george wrote:Oxygen 5.1 does not have this out of the box but you can get that using the XSLT 2.0 support.
...
Support directly in oXygen for this type of conversions will be available in the next release.
Hello George --

I've now got oXygen 9, and I'm trying this solution out on a .csv file of my own.
The support for conversion from CSV files to XML that George mentioned is the import feature available from File -> Import -> Text file (the comma delimiter) as you can see above in Stefan's post. You can use this feature for converting your CSV files to XML and if you need other XML format you should apply an XSLT stylesheet to convert the result of the import operation to your XML format.
Eiríkr wrote:Is there any way I'm missing of telling Saxon not to parse the source file as XML, but rather as straight-up text, but *without* having to specify the source file in a parameter?
Any XSLT transformer requires a well-formed XML document and a valid XSLT stylesheet as inputs. You have to set the name of the CSV file with a parameter to the XSLT stylesheet as you already did with the xsl:param element.


Regards,
Sorin
Eiríkr
Posts: 7
Joined: Mon Nov 19, 2007 6:53 pm
Location: Puget Sound

Post by Eiríkr »

sorin wrote: Any XSLT transformer requires a well-formed XML document and a valid XSLT stylesheet as inputs. You have to set the name of the CSV file with a parameter to the XSLT stylesheet as you already did with the xsl:param element.
Thank you, Sorin. I was wondering if there might be some sort of processor instruction to tell the parsing engine to handle the source document differently, but your comment and further thinking it through tells me that the needed processor instruction would likely have to be in the source document itself anyway, which clearly won't work in my case where the source .csv file is not known prior to runtime, and where the .csv file needs to be read-only to boot. So instead, I'll have to come up with some way of programmatically changing the specific xsl:param attribute value just before running the transformation, and simply handle that side of things via some different engine.

Cheers,

Eiríkr
KermitTensmeyer
Posts: 7
Joined: Sat Mar 08, 2008 12:03 am

Re: Can Oxygen Convert Tab Delimited To XML?

Post by KermitTensmeyer »

The simple solution is a two step process.

1) convert csv to an excel spreadsheet (open empty spreadsheet, import csv file)

2) oxygen, [file]/[import] spread sheet which brings up a nice interface
(it can be most useful if the first row of the spreadsheet contains column labels which then can become element tags into the converted XML file)

change root and row labels as need, and convert!
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Re: Can Oxygen Convert Tab Delimited To XML?

Post by george »

Eiríkr wrote: [...] So instead, I'll have to come up with some way of programmatically changing the specific xsl:param attribute value just before running the transformation, and simply handle that side of things via some different engine.
The filename is a parameter, that means you can specify its value before transforming. The value specified in the code is a default, that is used when you do not specify the parameter.
In oXygen you can specify the parameter from the transformation scenario dialog, see the Parameters button on the XSLT tab. If you have problems with configuring that let us know.

Best Regards,
George
George Cristian Bina
Post Reply