parse-html using Saxon-12 in Oxygen XML

Here should go questions about transforming XML with XSLT and FOP.
oyvind-g
Posts: 2
Joined: Tue Nov 19, 2024 1:18 pm

parse-html using Saxon-12 in Oxygen XML

Post by oyvind-g »

Hi,

I don't currently now how to run saxon:parse-html (https://www.saxonica.com/documentation1 ... parse-html) in the context of Oxygen XML 26.1 and xslt transformations.

I have tried the following code in a transformation scenario and in the debug mode, while trying to add three different jars:

1. the one suggested downloaded in the slack thread on xml.com (https://xmlcom.slack.com/archives/C011N ... 8858490299) : from http://validator.nu/

2. Using the download from maven repository https://mvnrepository.com/artifact/nu.v ... parser/1.4

3. Adding the patched html parser in Oxygen lib

<code>
<xsl:template name="xsl:initial-template">
<xsl:sequence select="saxon:parse-html('&lt;html&gt;&lt;p&gt;test&lt;/p&gt;&lt;/html&gt;')"/>
</xsl:template>
</code>

I get an error in the UI looking like:
E nu/validator/htmlparser/sax/HtmlParser
Martin Holmes in the slack thread mentioned that it might be a conflict with dependencies used in Oxygen, so I was wondering if it currently is possible, or if there is a bug. Saxon 12 reimplemented parse-html and now uses nu.validator instead of tagsoup, which was dropped in and that I believe worked in earlier Oxygen XML Editors. I believe I have dropped cowan jar successfully into previous versions of Oxygen, using Saxon < 12, to run the extension function.

Ps, I also tried to run the new fn:parse-html with xslt version set to 4.0, unsuccessfully (but gave same error message) . This requires setting allowSyntaxExtensions to true. This can currently be done from supplying a simple saxon config.xml, but a similar checkbox to allowing extension functions would be convenient.

Best ragards,
Øyvind
tavy
Posts: 388
Joined: Thu Jul 01, 2004 12:29 pm

Re: parse-html using Saxon-12 in Oxygen XML

Post by tavy »

Hello Øyvind,

Thank you for the feedback.

The saxon:parse-html() function is not implemented as an extension function. Therefore, you cannot specify the HtmlParser jar as an extension in the transformation scenario and expect Saxon to use it for the saxon:parse-html() function. This is because the Saxon jar is loaded by the Oxygen class loader when the application starts, and the HtmlParser jar (specified as an extension) is loaded when the transformation is performed.

A solution for this is to create a plugin that includes the HtmlParser jar in the Oxygen class loader. You can find an example of an addon that provides such a plugin here: https://github.com/oxygenxml/oxygenxml. ... ree/master

If you don't want to create an addon, I recommend you download the oxygenxml.icu4j.i18n plugin, add the jar file to the plugin lib folder, and then manually install the plugin in Oxygen XML Editor. You can find more details about this in our user manual: https://www.oxygenxml.com/doc/versions/ ... ugins.html

I also made a test with an addon that contributes the html parser jar. You can find it here:
https://github.com/octavianN/oxygenxml. ... tml-parser

Best Regards,
Octavian
Octavian Nadolu
<oXygen/> XML Editor
http://www.oxygenxml.com
oyvind-g
Posts: 2
Joined: Tue Nov 19, 2024 1:18 pm

Re: parse-html using Saxon-12 in Oxygen XML

Post by oyvind-g »

Hi Octavian,

Thank you very much for the solution, information and test plugin! It works perfectly with the example code in the readme. Enabling xslt 4.0 draft extensions and running fn:parse-html() also works if supplied a simple config file with:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<configuration xmlns="http://saxon.sf.net/ns/configuration" edition="EE">
    <global
        allowSyntaxExtensions="true"
    />
</configuration>
I did not know of the distinction of loading extensions and supplying jars in Oxygen vs. extentions (or about the plugin mechanism for adding libraries, e.g the extra icu4j-jars for more language support in saxons localization functions, and this example) and I will keep it in mind.

Best regards,
Øyvind
Post Reply