parse-html using Saxon-12 in Oxygen XML
Here should go questions about transforming XML with XSLT and FOP.
-
- Posts: 2
- Joined: Tue Nov 19, 2024 1:18 pm
parse-html using Saxon-12 in Oxygen XML
Hi,
I don't currently now how to run saxon:parse-html (https://www.saxonica.com/documentation1 ... parse-html) in the context of Oxygen XML 26.1 and xslt transformations.
I have tried the following code in a transformation scenario and in the debug mode, while trying to add three different jars:
1. the one suggested downloaded in the slack thread on xml.com (https://xmlcom.slack.com/archives/C011N ... 8858490299) : from http://validator.nu/
2. Using the download from maven repository https://mvnrepository.com/artifact/nu.v ... parser/1.4
3. Adding the patched html parser in Oxygen lib
<code>
<xsl:template name="xsl:initial-template">
<xsl:sequence select="saxon:parse-html('<html><p>test</p></html>')"/>
</xsl:template>
</code>
I get an error in the UI looking like:
Ps, I also tried to run the new fn:parse-html with xslt version set to 4.0, unsuccessfully (but gave same error message) . This requires setting allowSyntaxExtensions to true. This can currently be done from supplying a simple saxon config.xml, but a similar checkbox to allowing extension functions would be convenient.
Best ragards,
Øyvind
I don't currently now how to run saxon:parse-html (https://www.saxonica.com/documentation1 ... parse-html) in the context of Oxygen XML 26.1 and xslt transformations.
I have tried the following code in a transformation scenario and in the debug mode, while trying to add three different jars:
1. the one suggested downloaded in the slack thread on xml.com (https://xmlcom.slack.com/archives/C011N ... 8858490299) : from http://validator.nu/
2. Using the download from maven repository https://mvnrepository.com/artifact/nu.v ... parser/1.4
3. Adding the patched html parser in Oxygen lib
<code>
<xsl:template name="xsl:initial-template">
<xsl:sequence select="saxon:parse-html('<html><p>test</p></html>')"/>
</xsl:template>
</code>
I get an error in the UI looking like:
Martin Holmes in the slack thread mentioned that it might be a conflict with dependencies used in Oxygen, so I was wondering if it currently is possible, or if there is a bug. Saxon 12 reimplemented parse-html and now uses nu.validator instead of tagsoup, which was dropped in and that I believe worked in earlier Oxygen XML Editors. I believe I have dropped cowan jar successfully into previous versions of Oxygen, using Saxon < 12, to run the extension function.E nu/validator/htmlparser/sax/HtmlParser
Ps, I also tried to run the new fn:parse-html with xslt version set to 4.0, unsuccessfully (but gave same error message) . This requires setting allowSyntaxExtensions to true. This can currently be done from supplying a simple saxon config.xml, but a similar checkbox to allowing extension functions would be convenient.
Best ragards,
Øyvind
-
- Posts: 388
- Joined: Thu Jul 01, 2004 12:29 pm
Re: parse-html using Saxon-12 in Oxygen XML
Hello Øyvind,
Thank you for the feedback.
The saxon:parse-html() function is not implemented as an extension function. Therefore, you cannot specify the HtmlParser jar as an extension in the transformation scenario and expect Saxon to use it for the saxon:parse-html() function. This is because the Saxon jar is loaded by the Oxygen class loader when the application starts, and the HtmlParser jar (specified as an extension) is loaded when the transformation is performed.
A solution for this is to create a plugin that includes the HtmlParser jar in the Oxygen class loader. You can find an example of an addon that provides such a plugin here: https://github.com/oxygenxml/oxygenxml. ... ree/master
If you don't want to create an addon, I recommend you download the oxygenxml.icu4j.i18n plugin, add the jar file to the plugin lib folder, and then manually install the plugin in Oxygen XML Editor. You can find more details about this in our user manual: https://www.oxygenxml.com/doc/versions/ ... ugins.html
I also made a test with an addon that contributes the html parser jar. You can find it here:
https://github.com/octavianN/oxygenxml. ... tml-parser
Best Regards,
Octavian
Thank you for the feedback.
The saxon:parse-html() function is not implemented as an extension function. Therefore, you cannot specify the HtmlParser jar as an extension in the transformation scenario and expect Saxon to use it for the saxon:parse-html() function. This is because the Saxon jar is loaded by the Oxygen class loader when the application starts, and the HtmlParser jar (specified as an extension) is loaded when the transformation is performed.
A solution for this is to create a plugin that includes the HtmlParser jar in the Oxygen class loader. You can find an example of an addon that provides such a plugin here: https://github.com/oxygenxml/oxygenxml. ... ree/master
If you don't want to create an addon, I recommend you download the oxygenxml.icu4j.i18n plugin, add the jar file to the plugin lib folder, and then manually install the plugin in Oxygen XML Editor. You can find more details about this in our user manual: https://www.oxygenxml.com/doc/versions/ ... ugins.html
I also made a test with an addon that contributes the html parser jar. You can find it here:
https://github.com/octavianN/oxygenxml. ... tml-parser
Best Regards,
Octavian
Octavian Nadolu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 2
- Joined: Tue Nov 19, 2024 1:18 pm
Re: parse-html using Saxon-12 in Oxygen XML
Hi Octavian,
Thank you very much for the solution, information and test plugin! It works perfectly with the example code in the readme. Enabling xslt 4.0 draft extensions and running fn:parse-html() also works if supplied a simple config file with:
I did not know of the distinction of loading extensions and supplying jars in Oxygen vs. extentions (or about the plugin mechanism for adding libraries, e.g the extra icu4j-jars for more language support in saxons localization functions, and this example) and I will keep it in mind.
Best regards,
Øyvind
Thank you very much for the solution, information and test plugin! It works perfectly with the example code in the readme. Enabling xslt 4.0 draft extensions and running fn:parse-html() also works if supplied a simple config file with:
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns="http://saxon.sf.net/ns/configuration" edition="EE">
<global
allowSyntaxExtensions="true"
/>
</configuration>
Best regards,
Øyvind
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service