[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Saxon and ZWNJ

Subject: Re: [xsl] Saxon and ZWNJ
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Mon, 10 Jun 2013 07:58:17 +0100

Yes, I think it's a bug -- but not in Saxon.

Saxon's implementation of XdmItem.getStringValue() relies on calling
textNode.getNodeValue() in the underlying DOM, and my suspicion is that this
method is returning the value of the text node in escaped form.

What exactly is this "HTML cleaned DOM" that you are passing to the DOMSource
constructor? If my suspicion is correct, it doesn't implement the DOM spec

Michael Kay

PS: this question is very product specific. Product-specific questions are
better addressed to a product-specific forum rather than to the xsl-list. For
Saxon, you can use the forums at saxonica.plan.io

On 9 Jun 2013, at 22:42, Mohsen Saboorian wrote:

> Hi,
> I'm trying to evaluate an XPATH expression with saxon- using
> the following code snippet:
>  Configuration conf = new Configuration();
>  conf.setValidation(false);
>  Processor p = new Processor(false);
>  DocumentBuilder documentBuilder = p.newDocumentBuilder();
>  XPathCompiler xpathCompiler = p.newXPathCompiler();
>  XPathExecutable xpe = xpathCompiler.compile(expression);
>  XPathSelector xpath = xpe.load();
>  xpath.setContextItem(documentBuilder.build(new
> DOMSource(cleanHtml.document)));
>  XdmItem result = xpath.evaluateSingle();
> The HTML is in Persian script (whose cleaned DOM is passed as
> cleanHtml.document in the above code) which has ZWNJ (U+200C) not
> escaped.
> The matched XdmItem has ZWNJ (U+200C) (non-escaped) but when obtaining
> result.getStringValue(), the result has escaped ZWNJ as (&zwnj;) which
> doesn't seem to be correct because I'm getting node 'string' value.
> Is this a bug, or is there any flag to disable escaping special
> Unicode characters in saxon?
> Regards,
> Mohsen

Current Thread