Using collection() with archives

Here should go questions about transforming XML with XSLT and FOP.
Eiríkr
Posts: 7
Joined: Mon Nov 19, 2007 6:53 pm
Location: Puget Sound

Using collection() with archives

Post by Eiríkr »

I'm working on extracting content from MS Excel 2007 format. Ideally, I should be able to work with the archive directly; I can indeed access content without manually unzipping the Excel file first, so that's a good beginning.

However, I've run into some challenges when trying to dynamically load *.xml files located within the *.xlsm archive.

I don't know ahead of time how many worksheets the Excel workbook might contain, so I need to be able to look through multiple sheets, named internally sheet1.xml, sheet2.xml, ... sheetN.xml. Borrowing the wisdom of others found via Google, it sounds like the collection() function is a good way to leverage xsl:for-each constructs to work with multiple files.

However, attempting to use collection() produces an ArchiveEntryNotFoundException, though I can find nothing wrong with the path.

The problematic XSL:

Code: Select all

<xsl:for-each select="for $f in
collection(
concat($BASE_PATH, '!/xl/worksheets/?select=sheet*.xml;recurse=yes;on-error=warning')
)
return $f">
... do something with each file ...
</xsl:for-each>
The error I get:

Code: Select all

SystemID: I:\My Documents\OxygenXMLEditor\Projects\[proj_base]\XL-MT\XL2MT.xsl
Severity: error
Description: de.schlichtherle.io.ArchiveController$ArchiveEntryNotFoundException: I:\My Documents\OxygenXMLEditor\Projects\[proj_base]\XL-MT\Source\Working.xlsm\xl\worksheets (no such file entry) - I:\My Documents\OxygenXMLEditor\Projects\[proj_base]\XL-MT\Source\Working.xlsm\xl\worksheets (no such file entry)
Start location: 62:0
Is the collection() function simply not capable of handling archives? Have I goofed here somehow? Removing the recurse=yes;on-error=warning portion does not change the outcome.

Any advice appreciated.

Cheers,

-- Eiríkr
adrian
Posts: 2879
Joined: Tue May 17, 2005 4:01 pm

Re: Using collection() with archives

Post by adrian »

Hello,

The archive support in Oxygen(through URI) allows access only to files(archive entries). The URI you are composing refers a folder from inside the archive("worksheets") on which it executes a select. It is assumed that "worksheets" is a file(archive entry) and it fails accordingly. This support was never implemented because it isn't used by the Oxygen GUI and we thought it would be inaccessible by other means.

I'm afraid this means you cannot use the sheet discovery method you've used here, you have to know how many sheets are there and use their file names accordingly.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Eiríkr
Posts: 7
Joined: Mon Nov 19, 2007 6:53 pm
Location: Puget Sound

Re: Using collection() with archives

Post by Eiríkr »

Thank you Adrian, that's very informative.

A few questions come to mind:
  • Is this archive access feature specific to Oxygen, or is it part of (or at least usable by) the underlying Saxon XSL engine?
  • Will this kind of archive accessibility scheduled for implementation in future? If not, how do I request this?
  • Is there any Oxygen XSL feature to programmatically unzip an archive to a temp directory or other location, thereby allowing normal file and directory access? I suspect not, but no harm in asking. :)
I'm still learning how Excel 2007 stores things; I'm pretty sure I saw a couple other possible ways of learning sheet numbers and names, so at least this missing archive functionality isn't a showstopper.

Cheers,

-- Eiríkr
adrian
Posts: 2879
Joined: Tue May 17, 2005 4:01 pm

Re: Using collection() with archives

Post by adrian »

Hi,
Is this archive access feature specific to Oxygen, or is it part of (or at least usable by) the underlying Saxon XSL engine?
The archive access feature that you are using(zip:file protocol) is a feature specific to Oxygen. This allows read/write access to a file(archive entry) inside an archive.
You can also use the Java built-in jar access feature(also works with ZIP files) that works in any Java application but it is read-only. The URIs are about the same, you just have to change the 'zip:file' protocol with 'jar:file'.
e.g.
jar:file:/path/to/zip/my.zip!/path/inside/zip/my.resource
This kind of URI works even in Firefox.
Will this kind of archive accessibility scheduled for implementation in future? If not, how do I request this?
I'm afraid this support wouldn't be useful for Saxon even if we were to implement it.
From my tests the 'select' only works with the file protocol, probably something that Saxon does internally. I couldn't make it work with ftp or http so it wouldn't work with zip:file or jar:file either.
Is there any Oxygen XSL feature to programmatically unzip an archive to a temp directory or other location, thereby allowing normal file and directory access? I suspect not, but no harm in asking. :)
Sorry, there's no XSL feature. Although, you could do this if you were to write a Java extension that uses the TrueZIP API([oxygen-install-folder]/lib/truezip-6.jar]. You can also find the library here and use it separately from Oxygen: https://truezip.dev.java.net/

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Eiríkr
Posts: 7
Joined: Mon Nov 19, 2007 6:53 pm
Location: Puget Sound

Re: Using collection() with archives

Post by Eiríkr »

All good to know, thank you Adrian! I've since found other workarounds for this particular project, but I will keep your insights in mind for future.

Cheers,

-- Eiríkr
Post Reply