Using collection() with archives
Here should go questions about transforming XML with XSLT and FOP.
-
- Posts: 7
- Joined: Mon Nov 19, 2007 6:53 pm
- Location: Puget Sound
Using collection() with archives
I'm working on extracting content from MS Excel 2007 format. Ideally, I should be able to work with the archive directly; I can indeed access content without manually unzipping the Excel file first, so that's a good beginning.
However, I've run into some challenges when trying to dynamically load *.xml files located within the *.xlsm archive.
I don't know ahead of time how many worksheets the Excel workbook might contain, so I need to be able to look through multiple sheets, named internally sheet1.xml, sheet2.xml, ... sheetN.xml. Borrowing the wisdom of others found via Google, it sounds like the collection() function is a good way to leverage xsl:for-each constructs to work with multiple files.
However, attempting to use collection() produces an ArchiveEntryNotFoundException, though I can find nothing wrong with the path.
The problematic XSL:
The error I get:
Is the collection() function simply not capable of handling archives? Have I goofed here somehow? Removing the recurse=yes;on-error=warning portion does not change the outcome.
Any advice appreciated.
Cheers,
-- Eiríkr
However, I've run into some challenges when trying to dynamically load *.xml files located within the *.xlsm archive.
I don't know ahead of time how many worksheets the Excel workbook might contain, so I need to be able to look through multiple sheets, named internally sheet1.xml, sheet2.xml, ... sheetN.xml. Borrowing the wisdom of others found via Google, it sounds like the collection() function is a good way to leverage xsl:for-each constructs to work with multiple files.
However, attempting to use collection() produces an ArchiveEntryNotFoundException, though I can find nothing wrong with the path.
The problematic XSL:
Code: Select all
<xsl:for-each select="for $f in
collection(
concat($BASE_PATH, '!/xl/worksheets/?select=sheet*.xml;recurse=yes;on-error=warning')
)
return $f">
... do something with each file ...
</xsl:for-each>
Code: Select all
SystemID: I:\My Documents\OxygenXMLEditor\Projects\[proj_base]\XL-MT\XL2MT.xsl
Severity: error
Description: de.schlichtherle.io.ArchiveController$ArchiveEntryNotFoundException: I:\My Documents\OxygenXMLEditor\Projects\[proj_base]\XL-MT\Source\Working.xlsm\xl\worksheets (no such file entry) - I:\My Documents\OxygenXMLEditor\Projects\[proj_base]\XL-MT\Source\Working.xlsm\xl\worksheets (no such file entry)
Start location: 62:0
Any advice appreciated.
Cheers,
-- Eiríkr
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Using collection() with archives
Hello,
The archive support in Oxygen(through URI) allows access only to files(archive entries). The URI you are composing refers a folder from inside the archive("worksheets") on which it executes a select. It is assumed that "worksheets" is a file(archive entry) and it fails accordingly. This support was never implemented because it isn't used by the Oxygen GUI and we thought it would be inaccessible by other means.
I'm afraid this means you cannot use the sheet discovery method you've used here, you have to know how many sheets are there and use their file names accordingly.
Regards,
Adrian
The archive support in Oxygen(through URI) allows access only to files(archive entries). The URI you are composing refers a folder from inside the archive("worksheets") on which it executes a select. It is assumed that "worksheets" is a file(archive entry) and it fails accordingly. This support was never implemented because it isn't used by the Oxygen GUI and we thought it would be inaccessible by other means.
I'm afraid this means you cannot use the sheet discovery method you've used here, you have to know how many sheets are there and use their file names accordingly.
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 7
- Joined: Mon Nov 19, 2007 6:53 pm
- Location: Puget Sound
Re: Using collection() with archives
Thank you Adrian, that's very informative.
A few questions come to mind:
Cheers,
-- Eiríkr
A few questions come to mind:
- Is this archive access feature specific to Oxygen, or is it part of (or at least usable by) the underlying Saxon XSL engine?
- Will this kind of archive accessibility scheduled for implementation in future? If not, how do I request this?
- Is there any Oxygen XSL feature to programmatically unzip an archive to a temp directory or other location, thereby allowing normal file and directory access? I suspect not, but no harm in asking.
Cheers,
-- Eiríkr
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Using collection() with archives
Hi,
You can also use the Java built-in jar access feature(also works with ZIP files) that works in any Java application but it is read-only. The URIs are about the same, you just have to change the 'zip:file' protocol with 'jar:file'.
e.g.
jar:file:/path/to/zip/my.zip!/path/inside/zip/my.resource
This kind of URI works even in Firefox.
From my tests the 'select' only works with the file protocol, probably something that Saxon does internally. I couldn't make it work with ftp or http so it wouldn't work with zip:file or jar:file either.
Regards,
Adrian
The archive access feature that you are using(zip:file protocol) is a feature specific to Oxygen. This allows read/write access to a file(archive entry) inside an archive.Is this archive access feature specific to Oxygen, or is it part of (or at least usable by) the underlying Saxon XSL engine?
You can also use the Java built-in jar access feature(also works with ZIP files) that works in any Java application but it is read-only. The URIs are about the same, you just have to change the 'zip:file' protocol with 'jar:file'.
e.g.
jar:file:/path/to/zip/my.zip!/path/inside/zip/my.resource
This kind of URI works even in Firefox.
I'm afraid this support wouldn't be useful for Saxon even if we were to implement it.Will this kind of archive accessibility scheduled for implementation in future? If not, how do I request this?
From my tests the 'select' only works with the file protocol, probably something that Saxon does internally. I couldn't make it work with ftp or http so it wouldn't work with zip:file or jar:file either.
Sorry, there's no XSL feature. Although, you could do this if you were to write a Java extension that uses the TrueZIP API([oxygen-install-folder]/lib/truezip-6.jar]. You can also find the library here and use it separately from Oxygen: https://truezip.dev.java.net/Is there any Oxygen XSL feature to programmatically unzip an archive to a temp directory or other location, thereby allowing normal file and directory access? I suspect not, but no harm in asking.
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 7
- Joined: Mon Nov 19, 2007 6:53 pm
- Location: Puget Sound
Re: Using collection() with archives
All good to know, thank you Adrian! I've since found other workarounds for this particular project, but I will keep your insights in mind for future.
Cheers,
-- Eiríkr
Cheers,
-- Eiríkr
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service