export data from set of XML files
Oxygen general issues.
-
- Posts: 43
- Joined: Mon Oct 01, 2018 7:29 pm
export data from set of XML files
Post by david_himself »
I want to run an XSLT script to export the contents of three specific elements from each of many XML files in a directory. I have some old XSLTs used for similar purposes in the past, but I have forgotten the trick of setting up the output side of the transformation -- how to configure the transformation scenario. Apologies for such an elementary question. The exported data should be in text file format, e.g. a csv or tab file, though direct to Excel format would also be OK. I'm on Oxygen 24.0, Windows 10.
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: export data from set of XML files
Hi,
This topic in our user's guide shows how to create an XSLT transformation scenario:
https://www.oxygenxml.com/doc/versions/ ... ation.html
If you want the XSLT to process multiple XML document and aggregate the output in a single file, you probably need to use the XSLT collection() function to read in the XSLT all files and gather content from them.
Regards,
Radu
This topic in our user's guide shows how to create an XSLT transformation scenario:
https://www.oxygenxml.com/doc/versions/ ... ation.html
If you want the XSLT to process multiple XML document and aggregate the output in a single file, you probably need to use the XSLT collection() function to read in the XSLT all files and gather content from them.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 43
- Joined: Mon Oct 01, 2018 7:29 pm
Re: export data from set of XML files
Post by david_himself »
Hi Radu. I routinely create and use XSLT transformation scenarios that work on one file at a time, whether to modify the XML or to produce text output. That isn't the problem. I used to write XSLTs that aggregated the text output into a single file. Some did indeed use the collection() function, some specified a result-document() function. My comments in some of the files remind me to use the XSLT file path both for the XSLT URL (obviously) but also for the XML URL, which is counter-intuitive but which I remember working fine. Trouble is, I can't now get ANY of my old aggregating XSLTs to work, and I don't know why. I couldn't find a help topic which walked me through the process of writing the relevant parts of the XSLT and setting up the scenario.
best
David
best
David
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: export data from set of XML files
Hi David,
Regards,
Radu
Yes. Exactly.My comments in some of the files remind me to use the XSLT file path both for the XSLT URL (obviously) but also for the XML URL, which is counter-intuitive but which I remember working fine.
Maybe you can give me more details about what error you get in Oxygen when applying the XSLT transformation.Trouble is, I can't now get ANY of my old aggregating XSLTs to work, and I don't know why.
Usually you would open the XSLT stylesheet, use the "Configure Transformation Scenario" toolbar button to create an XSLT transformation scenario, in the scenario indeed for the "XML URL" reference set "${currentFileURL}" which resolves also to the XSLT, then once the transformation scenario is created run it over the current opened XSLT stylesheet which uses the XSLT collection() function to read xml files from a specific folder.I couldn't find a help topic which walked me through the process of writing the relevant parts of the XSLT and setting up the scenario.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 43
- Joined: Mon Oct 01, 2018 7:29 pm
Re: export data from set of XML files
Post by david_himself »
OK, here's what I'm trying, extracting 3 simple items from each of a large number of long XMLs. Not sure why I've got copy-of for two items and value-of for the third ;-) . (Sorry, don't know how to format a code block on this message board.) The empty output file that I get makes me wonder whether it's a namespace problem. The XSLT is in the same folder as the XMLs.
best
David
best
David
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:h="http://www.tei-c.org/ns/1.0"
xmlns="http://www.tei-c.org/ns/1.0"
version="2.0" exclude-result-prefixes="#all">
<xsl:output method="text"/>
<!-- Export of metadata from header. DD 2022-02-04 -->
<!-- Set both XML URL and XSL URL to this file! -->
<!-- Output needs some regex edits:
(([^\r])(\n\t+)+ to \1
Delete stuff at end.
Sort as integers ascending.
Import into Excel as UTF-8. Change to 9-point font, wrapped, align top. Freeze first row and column. -->
<xsl:template match="/">
<xsl:result-document href="file:/C:/temp/HAM_shelfmark-author-date.tab"/>
<xsl:variable name="header" select="collection('file:/C:/temp/XML_files/?select=**.xml')//teiHeader"/>
<xsl:text>Shelfmark	Author	Creation date</xsl:text>
<xsl:for-each select="$header">
<xsl:copy-of select="concat(' ',.//h:msDesc//h:idno[@type='shelfmark'])"/>
<xsl:text>	</xsl:text><xsl:copy-of select=".//h:msContents//h:author/h:persName"/>
<xsl:text>	</xsl:text><xsl:value-of select=".//h:msDesc//h:origDate"/>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: export data from set of XML files
Hi David,
Indeed when you match the TEI header you should use the proper prefix:
or use "*:teiHeader" to match the element no matter on what namespace.
Or on the XSLT stylesheet root element set the "xpath-default-namespace" attribute and then you can avoid using prefixes in XPath expressions:
Regards,
Radu
Indeed when you match the TEI header you should use the proper prefix:
Code: Select all
<xsl:variable name="header" select="collection('file:/C:/temp/XML_files/?select=**.xml')//h:teiHeader"/>
Or on the XSLT stylesheet root element set the "xpath-default-namespace" attribute and then you can avoid using prefixes in XPath expressions:
Code: Select all
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0"
xpath-default-namespace="http://www.tei-c.org/ns/1.0">
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 43
- Joined: Mon Oct 01, 2018 7:29 pm
Re: export data from set of XML files
Post by david_himself »
Thanks for spotting the missing prefix. But still no joy. If I just run the transformation on itself, it only takes a couple of seconds. If I select the folder containing all the XMLs it's supposed to work on, it goes through them all slowly (maybe a second or so each × 1400) and finally terminates OK. But either way, the output file is empty.
best
David
best
David
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: export data from set of XML files
Hi David,
How about if you add some xsl:messages in the XSLT stylesheet? Something like:
After you transform there should be a "Messages" tab at the bottom of Oxygen presenting these messages.
Other than that this should work, the folder "C:/temp/XML_files/" contains your XML files directly there right? Or does it have subfolders which contain them? Because if you have subfolders you should add a "recurse=yes" parameter:https://stackoverflow.com/questions/247 ... xslt-colle
Regards,
Radu
How about if you add some xsl:messages in the XSLT stylesheet? Something like:
Code: Select all
....
<xsl:for-each select="$header">
<xsl:message>Found <xsl:copy-of select="."/></xsl:message>
.....
Other than that this should work, the folder "C:/temp/XML_files/" contains your XML files directly there right? Or does it have subfolders which contain them? Because if you have subfolders you should add a "recurse=yes" parameter:https://stackoverflow.com/questions/247 ... xslt-colle
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 43
- Joined: Mon Oct 01, 2018 7:29 pm
Re: export data from set of XML files
Post by david_himself »
Sorry to prolong this. Your xsl:message worked fine, and the files were indeed being found. I used it to actually export what I wanted and quickly edit out the unnecessary lines with Notepad++. But I want to know how to do it properly for the future with more complex outputs.
(1) This still produces an empty output file:
The output file is empty but has a last-modified date showing that it has been accessed. Is there some obvious error?
(2) Modifying the for-each loop as below puts 3 of the needed data items into the message box, suggesting that the XPaths for the data are correct:
Btw, all the XMLs and the XSLT are in the same folder, no further subfolders involved. Any clue? Thanks as ever.
best
David
(1) This still produces an empty output file:
Code: Select all
<xsl:template match="/">
<xsl:result-document href="file:/C:/temp/HAM_shelfmark-author-date.tab"/>
<xsl:variable name="header" select="collection('file:/C:/temp/XML_files/?select=AR-HAM-00001-00002*.xml')//h:teiHeader"/>
<xsl:text>Shelfmark	Author	Creation date</xsl:text>
<xsl:for-each select="$header">
<xsl:value-of select="concat(' ',.//h:msDesc//h:idno[@type='shelfmark'])"/>
<xsl:text>	</xsl:text><xsl:value-of select=".//h:msDesc//h:origDate"/>
<xsl:text>	</xsl:text><xsl:value-of select=".//h:msContents//h:author"/>
<xsl:text>	</xsl:text><xsl:value-of select=".//h:correspAction[@type='received']/persName"/>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:template>
(2) Modifying the for-each loop as below puts 3 of the needed data items into the message box, suggesting that the XPaths for the data are correct:
Code: Select all
<xsl:for-each select="$header">
<xsl:message>
<xsl:value-of select="concat(' ',.//h:msDesc//h:idno[@type='shelfmark'])"/>
<xsl:text>	</xsl:text><xsl:value-of select=".//h:msContents//h:author"/>
<xsl:text>	</xsl:text><xsl:value-of select=".//h:msDesc//h:origDate"/></xsl:message>
</xsl:for-each>
best
David
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: export data from set of XML files
Hi David,
Well you seem to close the result-document very early:
instead of placing the code inside it:
Regards,
Radu
Well you seem to close the result-document very early:
Code: Select all
<xsl:result-document href="file:/C:/temp/HAM_shelfmark-author-date.tab"/>
Code: Select all
<xsl:result-document href="file:/C:/temp/HAM_shelfmark-author-date.tab">
///XSLT CODE HERE
</xsl:result-document>
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 43
- Joined: Mon Oct 01, 2018 7:29 pm
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service