export data from set of XML files

Oxygen general issues.
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

export data from set of XML files

Post by david_himself »

I want to run an XSLT script to export the contents of three specific elements from each of many XML files in a directory. I have some old XSLTs used for similar purposes in the past, but I have forgotten the trick of setting up the output side of the transformation -- how to configure the transformation scenario. Apologies for such an elementary question. The exported data should be in text file format, e.g. a csv or tab file, though direct to Excel format would also be OK. I'm on Oxygen 24.0, Windows 10.
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: export data from set of XML files

Post by Radu »

Hi,

This topic in our user's guide shows how to create an XSLT transformation scenario:
https://www.oxygenxml.com/doc/versions/ ... ation.html
If you want the XSLT to process multiple XML document and aggregate the output in a single file, you probably need to use the XSLT collection() function to read in the XSLT all files and gather content from them.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

Re: export data from set of XML files

Post by david_himself »

Hi Radu. I routinely create and use XSLT transformation scenarios that work on one file at a time, whether to modify the XML or to produce text output. That isn't the problem. I used to write XSLTs that aggregated the text output into a single file. Some did indeed use the collection() function, some specified a result-document() function. My comments in some of the files remind me to use the XSLT file path both for the XSLT URL (obviously) but also for the XML URL, which is counter-intuitive but which I remember working fine. Trouble is, I can't now get ANY of my old aggregating XSLTs to work, and I don't know why. I couldn't find a help topic which walked me through the process of writing the relevant parts of the XSLT and setting up the scenario.
best
David
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: export data from set of XML files

Post by Radu »

Hi David,
My comments in some of the files remind me to use the XSLT file path both for the XSLT URL (obviously) but also for the XML URL, which is counter-intuitive but which I remember working fine.
Yes. Exactly.
Trouble is, I can't now get ANY of my old aggregating XSLTs to work, and I don't know why.
Maybe you can give me more details about what error you get in Oxygen when applying the XSLT transformation.
I couldn't find a help topic which walked me through the process of writing the relevant parts of the XSLT and setting up the scenario.
Usually you would open the XSLT stylesheet, use the "Configure Transformation Scenario" toolbar button to create an XSLT transformation scenario, in the scenario indeed for the "XML URL" reference set "${currentFileURL}" which resolves also to the XSLT, then once the transformation scenario is created run it over the current opened XSLT stylesheet which uses the XSLT collection() function to read xml files from a specific folder.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

Re: export data from set of XML files

Post by david_himself »

OK, here's what I'm trying, extracting 3 simple items from each of a large number of long XMLs. Not sure why I've got copy-of for two items and value-of for the third ;-) . (Sorry, don't know how to format a code block on this message board.) The empty output file that I get makes me wonder whether it's a namespace problem. The XSLT is in the same folder as the XMLs.
best
David

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:h="http://www.tei-c.org/ns/1.0"
	xmlns="http://www.tei-c.org/ns/1.0"
	version="2.0"  exclude-result-prefixes="#all">
<xsl:output method="text"/>
	
	
<!--	Export of metadata from header. DD 2022-02-04	-->
<!-- 	Set both XML URL and XSL URL to this file!	-->
<!--	Output needs some regex edits:
		(([^\r])(\n\t+)+	to	\1
		Delete stuff at end.
		Sort as integers ascending.
		Import into Excel as UTF-8. Change to 9-point font, wrapped, align top. Freeze first row and column.	-->
		
  <xsl:template match="/">
  	<xsl:result-document href="file:/C:/temp/HAM_shelfmark-author-date.tab"/>
  	<xsl:variable name="header" select="collection('file:/C:/temp/XML_files/?select=**.xml')//teiHeader"/>
  		<xsl:text>Shelfmark&#09;Author&#09;Creation date</xsl:text>
		<xsl:for-each select="$header"> 
			<xsl:copy-of select="concat('&#13;&#10;',.//h:msDesc//h:idno[@type='shelfmark'])"/>
			<xsl:text>&#09;</xsl:text><xsl:copy-of select=".//h:msContents//h:author/h:persName"/>
			<xsl:text>&#09;</xsl:text><xsl:value-of select=".//h:msDesc//h:origDate"/>
		</xsl:for-each>
		<xsl:apply-templates/>
  </xsl:template>

  </xsl:stylesheet>
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: export data from set of XML files

Post by Radu »

Hi David,

Indeed when you match the TEI header you should use the proper prefix:

Code: Select all

<xsl:variable name="header" select="collection('file:/C:/temp/XML_files/?select=**.xml')//h:teiHeader"/>
or use "*:teiHeader" to match the element no matter on what namespace.

Or on the XSLT stylesheet root element set the "xpath-default-namespace" attribute and then you can avoid using prefixes in XPath expressions:

Code: Select all

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0">
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

Re: export data from set of XML files

Post by david_himself »

Thanks for spotting the missing prefix. But still no joy. If I just run the transformation on itself, it only takes a couple of seconds. If I select the folder containing all the XMLs it's supposed to work on, it goes through them all slowly (maybe a second or so each × 1400) and finally terminates OK. But either way, the output file is empty.
best
David
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: export data from set of XML files

Post by Radu »

Hi David,

How about if you add some xsl:messages in the XSLT stylesheet? Something like:

Code: Select all

 
 ....
 <xsl:for-each select="$header"> 
            <xsl:message>Found <xsl:copy-of select="."/></xsl:message>
            .....
After you transform there should be a "Messages" tab at the bottom of Oxygen presenting these messages.
Other than that this should work, the folder "C:/temp/XML_files/" contains your XML files directly there right? Or does it have subfolders which contain them? Because if you have subfolders you should add a "recurse=yes" parameter:https://stackoverflow.com/questions/247 ... xslt-colle

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

Re: export data from set of XML files

Post by david_himself »

Sorry to prolong this. Your xsl:message worked fine, and the files were indeed being found. I used it to actually export what I wanted and quickly edit out the unnecessary lines with Notepad++. But I want to know how to do it properly for the future with more complex outputs.
(1) This still produces an empty output file:

Code: Select all

  <xsl:template match="/">
  	<xsl:result-document href="file:/C:/temp/HAM_shelfmark-author-date.tab"/>
  	<xsl:variable name="header" select="collection('file:/C:/temp/XML_files/?select=AR-HAM-00001-00002*.xml')//h:teiHeader"/>
  		<xsl:text>Shelfmark&#09;Author&#09;Creation date</xsl:text>
		<xsl:for-each select="$header"> 
			<xsl:value-of select="concat('&#13;&#10;',.//h:msDesc//h:idno[@type='shelfmark'])"/>
			<xsl:text>&#09;</xsl:text><xsl:value-of select=".//h:msDesc//h:origDate"/>
			<xsl:text>&#09;</xsl:text><xsl:value-of select=".//h:msContents//h:author"/>
			<xsl:text>&#09;</xsl:text><xsl:value-of select=".//h:correspAction[@type='received']/persName"/>
		</xsl:for-each>
		<xsl:apply-templates/>
  </xsl:template>
The output file is empty but has a last-modified date showing that it has been accessed. Is there some obvious error?
(2) Modifying the for-each loop as below puts 3 of the needed data items into the message box, suggesting that the XPaths for the data are correct:

Code: Select all

<xsl:for-each select="$header"> 
           <xsl:message>
			<xsl:value-of select="concat('&#13;&#10;',.//h:msDesc//h:idno[@type='shelfmark'])"/>
			<xsl:text>&#09;</xsl:text><xsl:value-of select=".//h:msContents//h:author"/>
			<xsl:text>&#09;</xsl:text><xsl:value-of select=".//h:msDesc//h:origDate"/></xsl:message>
		</xsl:for-each>
Btw, all the XMLs and the XSLT are in the same folder, no further subfolders involved. Any clue? Thanks as ever.
best
David
Radu
Posts: 9018
Joined: Fri Jul 09, 2004 5:18 pm

Re: export data from set of XML files

Post by Radu »

Hi David,

Well you seem to close the result-document very early:

Code: Select all

<xsl:result-document href="file:/C:/temp/HAM_shelfmark-author-date.tab"/>
instead of placing the code inside it:

Code: Select all

<xsl:result-document href="file:/C:/temp/HAM_shelfmark-author-date.tab">
///XSLT CODE HERE
</xsl:result-document>

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
david_himself
Posts: 40
Joined: Mon Oct 01, 2018 7:29 pm

Re: export data from set of XML files

Post by david_himself »

Of course! All good now. Thank you.
D
Post Reply